Re: Disable Caching

2012-10-17 Thread Anderson vasconcelos
Thanks for the replies.



2012/10/17 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 If you are not searching against your master, and you shouldn't (and
 it sounds like you aren't), then you don't have to worry about
 disabling caches - they will just remain empty.  You could comment
 them out, but I think that won't actually disable them.

 Warmup queries you can just comment our in solrconfig.xml.

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Wed, Oct 17, 2012 at 12:25 PM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
  Hi
 
  I have a server that just index data and sincronize this data to others
  slaves. In my arquitecture, i have a one master server that only receive
  index requests and n slaves that receives only search requests.
 
  I wanna to disable the cache of the master server, because they not
 receive
  a search request, this is the best way? I can do this?
 
  Wat about warmingSearch, i must disable this too?
 
  I'm using solr 3.6.0
 
  Thanks



Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
Hi.

In debug mode, the generated query was:

str name=rawquerystringfield:*2231-7/str
str name=querystringfield:*2231-7/str
str name=parsedqueryfield:*2231-7/str
str name=parsedquery_toStringfield:*2231-7/str

The analisys of indexing the  text  .2231-7 produces this result:
Index Analyzer  .22317  .22317  .22317  .22317  #1;1322.
#1;7 .22317
And for search for *2231-7 , produces this result:
Query Analyzer  22317  22317  22317  22317 22317

I don't understand why he don't find results when i use field:*2231-7.
When i use field:*2231 without -7 the document was found.

How Ahmet said, i think they using -7 to ignore the document. But in
debug query, they don't show this.

Any idea to solve this?

Thanks


2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on




Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
I change the fieldtype of field to  the follow:

fieldType name=text_ws class=solr.TextField positionIncrementGap=100
analyzertokenizer
class=solr.WhitespaceTokenizerFactory//analyzer
/fieldType

As you see, i just keep the WhitespaceTokenizerFactory. That's works. Now i
could find using *2231?7, *2231*7, *2231-7,
*2231*,.2231-7.

How i can see, with this tokenizer the text was not spplitted. Is that the
best way to solve this?

Thanks



2012/5/21 Anderson vasconcelos anderson.v...@gmail.com

 Hi.

 In debug mode, the generated query was:

 str name=rawquerystringfield:*2231-7/str
 str name=querystringfield:*2231-7/str
 str name=parsedqueryfield:*2231-7/str
 str name=parsedquery_toStringfield:*2231-7/str

 The analisys of indexing the  text  .2231-7 produces this result:
 Index Analyzer  .22317  .22317  .22317  .22317
 #1;1322.#1;7 .22317
 And for search for *2231-7 , produces this result:
 Query Analyzer  22317  22317  22317  22317 22317

 I don't understand why he don't find results when i use field:*2231-7.
 When i use field:*2231 without -7 the document was found.

 How Ahmet said, i think they using -7 to ignore the document. But in
 debug query, they don't show this.

 Any idea to solve this?

 Thanks


 2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on





Re: Question about wildcards

2012-05-21 Thread Anderson vasconcelos
Thanks all for the explanations.

Anderson

2012/5/21 Jack Krupansky j...@basetechnology.com

 And, generally when I see a field that has values like .2231-7, it
 should be a string field rather than tokenized text. As a string, you can
 then do straight wildcards without surprises.


 -- Jack Krupansky
 -Original Message- From: Jack Krupansky
 Sent: Monday, May 21, 2012 11:23 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Question about wildcards

 Before Solr 3.6, which added MultiTermAwareComponent for analyzers, the
 presence of a wildcard completely short-circuited (prevented) the
 query-time
 analysis, so you have to manually emulate all steps of the query analyzer
 yourself if you want to do a wildcard. Even with 3.6, not all filters are
 multi-term aware.

 See:
 http://wiki.apache.org/solr/**MultitermQueryAnalysishttp://wiki.apache.org/solr/MultitermQueryAnalysis

 Do a query for .2231-7 and that will tell you which analyzer steps
 you
 will have to do manually.

 -- Jack Krupansky

 -Original Message- From: Anderson vasconcelos
 Sent: Monday, May 21, 2012 11:03 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about wildcards

 Hi.

 In debug mode, the generated query was:

 str name=rawquerystringfield:***2231-7/str
 str name=querystringfield:***2231-7/str
 str name=parsedqueryfield:***2231-7/str
 str name=parsedquery_toString**field:*2231-7/str

 The analisys of indexing the  text  .2231-7 produces this result:
 Index Analyzer  .22317  .22317  .22317  .22317
  #1;1322.
 #1;7 .22317
 And for search for *2231-7 , produces this result:
 Query Analyzer  22317  22317  22317  22317 22317

 I don't understand why he don't find results when i use field:*2231-7.
 When i use field:*2231 without -7 the document was found.

 How Ahmet said, i think they using -7 to ignore the document. But in
 debug query, they don't show this.

 Any idea to solve this?

 Thanks


 2012/5/18 Ahmet Arslan iori...@yahoo.com



  I have a field that was indexed with the string
  .2231-7. When i
  search using '*' or '?' like this *2231-7 the query
  don't returns
  results. When i remove -7 substring and search agin using
  *2231 the
  query returns. Finally when i search using
  .2231-7 the query returns
  too.

 May be standard tokenizer is splitting .2231-7 into multiple tokens?
 You can check that admin/analysis page.

 May be -7 is treated as negative clause? You can check that with
 debugQuery=on





Re: Question about cache

2012-05-18 Thread Anderson vasconcelos
Hi Kuli

Is Just raising. Thanks for the explanation.

Regards

Anderson

2012/5/11 Shawn Heisey s...@elyograg.org

 On 5/11/2012 9:30 AM, Anderson vasconcelos wrote:

 HI  Kuli

 The free -m command gives me
total   used   free sharedbuffers
 cached
 Mem:  9991   9934 57  0 75   5759
 -/+ buffers/cache:   4099   5892
 Swap: 8189   3395   4793

 You can see that has only 57m free and 5GB cached.

 In top command, the glassfish process used 79,7% of memory:

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
 COMMAND
  4336 root  21   0 29.7g 7.8g 4.0g S 0.3  79.7   5349:14
 java


 If i increase the memory of server for more 2GB, the SO will be use this
 additional 2GB in cache? I need to increse the memory size?


 Are you having a problem you need to track down, or are you just raising a
 concern because your memory usage is not what you expected?

 It is 100% normal for a Linux system to show only a few megabytes of
 memory free.  To make things run faster, the OS caches disk data using
 memory that is not directly allocated to programs or the OS itself.  If a
 program requests memory, the OS will allocate it immediately, it simply
 forgets the least used part of the cache.

 Windows does this too, but Microsoft decided that novice users would freak
 out if the task manager were to give users the true picture of memory
 usage, so they exclude disk cache when calculating free memory.  It's not
 really a lie, just not the full true picture.

 A recent version of Solr (3.5, if I remember right) made a major change in
 the way that the index files are accessed.  The way things are done now is
 almost always faster, but it makes the memory usage in the top command
 completely useless.  The VIRT memory size includes all of your index files,
 plus all the memory that the java process is capable of allocating, plus a
 little that i can't quite account for.  The RES size is also bigger than
 expected, and I'm not sure why.

 Based on the numbers above, I am guessing that your indexes take up
 15-20GB of disk space.  For best performance, you would want a machine with
 at least 24GB of RAM so that your entire index can fit into the OS disk
 cache.  The 10GB you have (which leaves the 5.8 GB for disk cache as you
 have seen) may be good enough to cache the frequently accessed portions of
 your index, so your performance might be just fine.

 Thanks,
 Shawn




Re: Identify indexed terms of document

2012-05-11 Thread Anderson vasconcelos
Thanks

2012/5/11 Michael Kuhlmann k...@solarier.de

 Am 10.05.2012 22:27, schrieb Ahmet Arslan:



  It's possible to see what terms are indexed for a field of
 document that
 stored=false?


 One way is to use 
 http://wiki.apache.org/solr/**LukeRequestHandlerhttp://wiki.apache.org/solr/LukeRequestHandler


 Another approach is this:

 - Query for exactly this document, e.g. by using the unique field
 - Add this to your URL parameters:
 facet=truefacet.field=Your fieldfacet.mincount=1

 -Kuli



Re: Question about cache

2012-05-11 Thread Anderson vasconcelos
HI  Kuli

The free -m command gives me
   total   used   free sharedbuffers
cached
Mem:  9991   9934 57  0 75   5759
-/+ buffers/cache:   4099   5892
Swap: 8189   3395   4793

You can see that has only 57m free and 5GB cached.

In top command, the glassfish process used 79,7% of memory:

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND
 4336 root  21   0 29.7g 7.8g 4.0g S 0.3  79.7   5349:14
java


If i increase the memory of server for more 2GB, the SO will be use this
additional 2GB in cache? I need to increse the memory size?

Thanks





2012/5/11 Michael Kuhlmann k...@solarier.de

 Am 11.05.2012 15:48, schrieb Anderson vasconcelos:

  Hi

 Analysing the solr server in glassfish with Jconsole, the Heap Memory
 Usage
 don't use more than 4 GB. But, when was executed the TOP comand, the free
 memory in Operating system is only 200 MB. The physical memory is only
 10GB.

 Why machine used so much memory? The cache fields are included in Heap
 Memory usage? The other 5,8 GB is the caching of Operating System for
 recent open files? Exists some way to tunning this?

 Thanks

  If the OS is Linux or some other Unix variant, it keeps as much disk
 content in memory as possible. Whenever new memory is needed, it
 automatically gets freed. That won't need time, and there's no need to tune
 anything.

 Don't look at the free memory in top command, it's nearly useless. Have a
 look at how much memory your Glassfish process is consuming, and use the
 'free' command (maybe together with the -m parameter for human readability)
 to find out more about your free memory. The 
 -/+ buffers/cache line is relevant.

 Greetings,
 Kuli



Re: Question about Streaming Update Solr Server

2012-03-08 Thread Anderson vasconcelos
Anyone could reply this questions?

Thanks

2012/3/5 Anderson vasconcelos anderson.v...@gmail.com

 Hi

 I have some questions about StreamingUpdateSolrServer.

 1)What's queue size parameter? It's the number of documents in each
 thread?

 2)When i configurated  like this StreamingUpdateSolrServer(URL, 1000, 5)
 indexing runs ok. But when i up the number of threads like this new
 StreamingUpdateSolrServer(URL, 1000, 15)  i received a
 java.net.SocketException: Broken pipe. Why?

 3)When i indexing using addBean method,  they open the max of threads than
 i configured. But when i use addBeans, they open only one thread. Is this
 correct?


 Thanks


Re: Permissions and user to acess administrative interface

2012-02-13 Thread Anderson vasconcelos
Thanks for the responses. I will create rules via  htaccess.

Regards

Vasconcelos

2012/2/13 Ge, Yao (Y.) y...@ford.com

 I can only speak from my experience with Tomcat.
 First make sure the available authentication modes are available by
 checking server.xml.
 I added a few roles in tomcat-users.xml and add individual user
 id/password to these roles. For example you can separate by Search, Update,
 Admin roles.
 Modified the web.xml to map different modules to different roles.

 -Yao

 -Original Message-
 From: Em [mailto:mailformailingli...@yahoo.de]
 Sent: Monday, February 13, 2012 11:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Permissions and user to acess administrative interface

 Hi Anderson,

 you will need to rearrange the JSPs a little bit to do what you want.
 If you do so, you can create rules via .htaccess.

 Otherwise I would suggest you to look for a commercial distribution of
 Solr which might fit your needs.

 Regards,
 Em

 Am 13.02.2012 16:48, schrieb Anderson vasconcelos:
  Hi All
 
  Is there some way to add users and permissions on SOLR administration
 page?
  I need to restrict the access of users in the administration page. I Just
  wanna expose  the query section for determinate user. Addition, i wanna
 to
  restrict the access of the cores per user. Somethings like that:
 
  Core 1 - Users : John, Paul, Carter
   Full Interface: John, Paul
   Only search interface: Carter
  Core 2  -Users: John , Mary
   Full Interface: John
   Only search interface: Mary
 
  Is that possible?
 
  Thanks
 



Re: Using UUID for uniqueId

2012-02-08 Thread Anderson vasconcelos
Thanks
2012/2/8 François Schiettecatte fschietteca...@gmail.com

 Anderson

 I would say that this is highly unlikely, but you would need to pay
 attention to how they are generated, this would be a good place to start:

http://en.wikipedia.org/wiki/Universally_unique_identifier

 Cheers

 François

 On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote:

  HI all
 
  If i use the UUID like a uniqueId in the future if i break my index in
  shards, i will have problems? The UUID generation could generate the same
  UUID in differents machines?
 
  Thanks




Re: Multiple Data Directories and 1 SOLR instance

2012-01-26 Thread Anderson vasconcelos
Nitin,

Use Multicore configuration. For each organization, you create a new core
with especific configurations. You will have one SOLR instance and one SOLR
Admin tool to manage all cores. The configuration is simple.

Good Luck

Regards

Anderson

2012/1/26 David Radunz da...@boxen.net

 Hey,

Sounds like what you need to setup is Multiple Cores configuration.
 At first I confused this with Multi Core CPU, but that's not what it's
 about. Basically it's a way to run multiple 'solr'
 cores/indexes/configurations from a single Solr instance (which will scale
 better as the resources will be shared). Have a read anyway:
 http://wiki.apache.org/solr/**CoreAdminhttp://wiki.apache.org/solr/CoreAdmin

 Cheers,

 David


 On 27/01/2012 8:18 AM, Nitin Arora wrote:

 Hi,

 We are using SOLR/Lucene to index/search the data about the user's of an
 organization. The nature of data is brief information about the user's
 work.
 Our data index requirement is to have segregated stores for each
 organization and currently we have 10 organizations and we have to run 10
 different instances of SOLR to serve search results for an organization.
 As
 the new organizations are joining it is getting difficult to manage these
 many instances.

 I think now there is a need to use 1 SOLR instance and then have
 10/multiple
 different data directories for each organization.

 When index/search request is received in SOLR we decide the data directory
 based on the organization.

1. Is it possible to do the same in SOLR and how can we achieve
 the same?
2. Will it be a good design to use SOLR like this?
3. Is there any impact on the scalability if we are able to manage
 the
 separate data directories inside SOLR?

 Thanks in advance

 Nitin


 --
 View this message in context: http://lucene.472066.n3.**
 nabble.com/Multiple-Data-**Directories-and-1-SOLR-**
 instance-tp3691644p3691644.**htmlhttp://lucene.472066.n3.nabble.com/Multiple-Data-Directories-and-1-SOLR-instance-tp3691644p3691644.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: solr replication

2012-01-25 Thread Anderson vasconcelos
Hi Parvin

I did something that may help you. I set up apache (with mod_proxy and mode
balance) like a front-end and use this to distruted the request of my
aplication. Request for /update or /optmize, i'm redirect to master (or
masters) server and requests /search i redirect to slaves. Example:

Proxy balancer://solrclusterindex
BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On
route=jvm1
/Proxy

Proxy balancer://solrclustersearch
BalancerMember http://127.0.0.1:8080/apache-solr-1.4.1/ disablereuse=On
route=jvm1
BalancerMember http://10.16.129.61:8080/apache-solr-1.4.1/ disablereuse=On
route=jvm2
/Proxy

ProxyPassMatch /solrcluster(.*)/update(.*)$
balancer://solrclusterindex$1/update$2
ProxyPassMatch /solrcluster(.*)/select(.*)$
balancer://solrclustersearch$1/select$2

I hope it helps you


Re: Indexing failover and replication

2012-01-25 Thread Anderson vasconcelos
Thanks for the Reply Erick
I will make the replication to both master manually.

Thanks

2012/1/25, Erick Erickson erickerick...@gmail.com:
 No, there no good ways to have a single slave know about
 two masters and just use the right one. It sounds like you've
 got each machine being both a master and a slave? This is
 not supported. What you probably want to do is either set
 up a repeater or just index to the two masters and manually
 change the back to the primary if the primary goes down, having
 all replication happen from the master.

 Best
 Erick

 On Tue, Jan 24, 2012 at 11:36 AM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Hi
 I'm doing now a test with replication using solr 1.4.1. I configured
 two servers (server1 and server 2) as master/slave to sincronized
 both. I put apache on the front side, and we index sometime in server1
 and sometime  in server2.

 I realized that the both index servers are now confused. In solr data
 folder, was created many index folders with the timestamp of
 syncronization (Exemple: index.20120124041340) with some segments
 inside.

 I thought that was possible to index in two master server and than
 synchronized both using replication. It's really possible do this with
 replication mechanism? If is possible, what I have done wrong?

 I need to have more than one node for indexing to guarantee failover
 feature for indexing. MultiMaster is the best way to guarantee
 failover feature for indexing?

 Thanks



Re: Size of index to use shard

2012-01-24 Thread Anderson vasconcelos
Apparently, not so easy to determine when to break the content into
pieces. I'll investigate further about the amount of documents, the
size of each document and what kind of search is being used. It seems,
I will have to do a load test to identify the cutoff point to begin
using the strategy of shards.

Thanks

2012/1/24, Dmitry Kan dmitry@gmail.com:
 Hi,

 The article you gave mentions 13GB of index size. It is quite small index
 from our perspective. We have noticed, that at least solr 3.4 has some sort
 of choking point with respect to growing index size. It just becomes
 substantially slower than what we need (a query on avg taking more than 3-4
 seconds) once index size crosses a magic level (about 80GB following our
 practical observations). We try to keep our indices at around 60-70GB for
 fast searches and above 100GB for slow ones. We also route majority of user
 queries to fast indices. Yes, caching may help, but not necessarily we can
 afford adding more RAM for bigger indices. BTW, our documents are very
 small, thus in 100GB index we can have around 200 mil. documents. It would
 be interesting to see, how you manage to ensure q-times under 1 sec with an
 index of 250GB? How many documents / facets do you ask max. at a time? FYI,
 we ask for a thousand of facets in one go.

 Regards,
 Dmitry

 On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann 
 v.kisselm...@googlemail.com wrote:

 Hi,
 it depends from your hardware.
 Read this:

 http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
 Think about your cache-config (few updates, big caches) and a good
 HW-infrastructure.
 In my case i can handle a 250GB index with 100mil. docs on a I7
 machine with RAID10 and 24GB RAM = q-times under 1 sec.
 Regards
 Vadim



 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
  Hi
  Has some size of index (or number of docs) that is necessary to break
  the index in shards?
  I have a index with 100GB of size. This index increase 10GB per year.
  (I don't have information how many docs they have) and the docs never
  will be deleted.  Thinking in 30 years, the index will be with 400GB
  of size.
 
  I think  is not required to break in shard, because i not consider
  this like a large index. Am I correct? What's is a real large
  index
 
 
  Thanks




Re: Size of index to use shard

2012-01-24 Thread Anderson vasconcelos
Thanks for the explanation Erick :)

2012/1/24, Erick Erickson erickerick...@gmail.com:
 Talking about index size can be very misleading. Take
 a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names.
 Note that the *.fdt and *.fdx files are used to for stored fields, i.e.
 the verbatim copy of data put in the index when you specify
 stored=true. These files have virtually no impact on search
 speed.

 So, if your *.fdx and *.fdt files are 90G out of a 100G index
 it is a much different thing than if these files are 10G out of
 a 100G index.

 And this doesn't even mention the peculiarities of your query mix.
 Nor does it say a thing about whether your cheapest alternative
 is to add more memory.

 Anderson's method is about the only reliable one, you just have
 to test with your index and real queries. At some point, you'll
 find your tipping point, typically when you come under memory
 pressure. And it's a balancing act between how much memory
 you allocate to the JVM and how much you leave for the op
 system.

 Bottom line: No hard and fast numbers. And you should periodically
 re-test the empirical numbers you *do* arrive at...

 Best
 Erick

 On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Apparently, not so easy to determine when to break the content into
 pieces. I'll investigate further about the amount of documents, the
 size of each document and what kind of search is being used. It seems,
 I will have to do a load test to identify the cutoff point to begin
 using the strategy of shards.

 Thanks

 2012/1/24, Dmitry Kan dmitry@gmail.com:
 Hi,

 The article you gave mentions 13GB of index size. It is quite small index
 from our perspective. We have noticed, that at least solr 3.4 has some
 sort
 of choking point with respect to growing index size. It just becomes
 substantially slower than what we need (a query on avg taking more than
 3-4
 seconds) once index size crosses a magic level (about 80GB following our
 practical observations). We try to keep our indices at around 60-70GB for
 fast searches and above 100GB for slow ones. We also route majority of
 user
 queries to fast indices. Yes, caching may help, but not necessarily we
 can
 afford adding more RAM for bigger indices. BTW, our documents are very
 small, thus in 100GB index we can have around 200 mil. documents. It
 would
 be interesting to see, how you manage to ensure q-times under 1 sec with
 an
 index of 250GB? How many documents / facets do you ask max. at a time?
 FYI,
 we ask for a thousand of facets in one go.

 Regards,
 Dmitry

 On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann 
 v.kisselm...@googlemail.com wrote:

 Hi,
 it depends from your hardware.
 Read this:

 http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/
 Think about your cache-config (few updates, big caches) and a good
 HW-infrastructure.
 In my case i can handle a 250GB index with 100mil. docs on a I7
 machine with RAID10 and 24GB RAM = q-times under 1 sec.
 Regards
 Vadim



 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com:
  Hi
  Has some size of index (or number of docs) that is necessary to break
  the index in shards?
  I have a index with 100GB of size. This index increase 10GB per year.
  (I don't have information how many docs they have) and the docs never
  will be deleted.  Thinking in 30 years, the index will be with 400GB
  of size.
 
  I think  is not required to break in shard, because i not consider
  this like a large index. Am I correct? What's is a real large
  index
 
 
  Thanks





Indexing failover and replication

2012-01-24 Thread Anderson vasconcelos
Hi
I'm doing now a test with replication using solr 1.4.1. I configured
two servers (server1 and server 2) as master/slave to sincronized
both. I put apache on the front side, and we index sometime in server1
and sometime  in server2.

I realized that the both index servers are now confused. In solr data
folder, was created many index folders with the timestamp of
syncronization (Exemple: index.20120124041340) with some segments
inside.

I thought that was possible to index in two master server and than
synchronized both using replication. It's really possible do this with
replication mechanism? If is possible, what I have done wrong?

I need to have more than one node for indexing to guarantee failover
feature for indexing. MultiMaster is the best way to guarantee
failover feature for indexing?

Thanks


Size of index to use shard

2012-01-23 Thread Anderson vasconcelos
Hi
Has some size of index (or number of docs) that is necessary to break
the index in shards?
I have a index with 100GB of size. This index increase 10GB per year.
(I don't have information how many docs they have) and the docs never
will be deleted.  Thinking in 30 years, the index will be with 400GB
of size.

I think  is not required to break in shard, because i not consider
this like a large index. Am I correct? What's is a real large
index


Thanks


Re: Phonetic search for portuguese

2012-01-22 Thread Anderson vasconcelos
Anyone could help?

Thanks

2012/1/20, Anderson vasconcelos anderson.v...@gmail.com:
 Hi

 The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
 Caverphone) is only for english language or works for other languages? Have
 some phonetic filter for portuguese? If dont have, how i can implement
 this?

 Thanks



Re: Phonetic search for portuguese

2012-01-22 Thread Anderson vasconcelos
Hi Gora, thanks for the reply.

I'm interesting in see how you did this solution. But , my time is not
to long and i need to create some solution for my client early. If
anyone knows some other simple and fast solution, please post on this
thread.

Gora, you could talk how you implemented the Custom Filter Factory and
how used this on SOLR?

Thanks


2012/1/22, Gora Mohanty g...@mimirtech.com:
 On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Anyone could help?

 Thanks

 2012/1/20, Anderson vasconcelos anderson.v...@gmail.com:
 Hi

 The phonetic filters (DoubleMetaphone, Metaphone, Soundex,
 RefinedSoundex,
 Caverphone) is only for english language or works for other languages?
 Have
 some phonetic filter for portuguese? If dont have, how i can implement
 this?

 We did this, in another context, by using the open-source aspell library to
 handle the spell-checking for us. This has distinct advantages as aspell
 is well-tested, handles soundslike in a better manner at least IMHO, and
 supports a wide variety of languages, including Portugese.

 There are some drawbacks, as aspell only has C/C++ interfaces, and
 hence we built bindings on top of SWIG. Also, we handled the integration
 with Solr via a custom filter factory, though there are better ways to do
 this.
 Such a project would thus, have dependencies on aspell, and our custom
 code. If there is interest in this, we would be happy to open source this
 code: Given our current schedule this could take 2-3 weeks.

 Regards,
 Gora



Re: Phonetic search for portuguese

2012-01-22 Thread Anderson vasconcelos
Thanks a lot Gora.
I need to delivery the first release for my client on 25 january.
With your explanation, i can negociate better the date to delivery of
this feature for next month, because i have other business rules for
delivery and this features is more complex than i thought.
I could help you to shared this solution with solr community. Maybe we
can create some component in google code, or something like that, wich
any solr user can use.

2012/1/23, Gora Mohanty g...@mimirtech.com:
 On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos
 anderson.v...@gmail.com wrote:
 Hi Gora, thanks for the reply.

 I'm interesting in see how you did this solution. But , my time is not
 to long and i need to create some solution for my client early. If
 anyone knows some other simple and fast solution, please post on this
 thread.

 What is your time line? I will see if we can expedite the open
 sourcing of this.

 Gora, you could talk how you implemented the Custom Filter Factory and
 how used this on SOLR?
 [...]

 That part is quite simple, though it is possible that I have not
 correctly addressed all issues for a custom FilterFactory.
 Please see:
   AspellFilterFactory: http://pastebin.com/jTBcfmd1
   AspellFilter:http://pastebin.com/jDDKrPiK

 The latter loads a java_aspell library that is created by SWIG
 by setting up Java bindings on top of SWIG, and configuring
 it for the language of interest.

 Next, you will need a library that encapsulates various
 aspell functionality in Java. I am afraid that this is a little
 long:
   Suggest: http://pastebin.com/6NrGCVma

 Finally, you will have to set up the Solr schema to use
 this filter factory, e.g., one could create a new Solr
 TextField, where the solr.DoubleMetaphoneFilterFactory
 is replaced with
 com.mimirtech.search.solr.analysis.AspellFilterFactory

 We can discuss further how to set this up, but should
 probably take that discussion off-list.

 Regards,
 Gora



Re: HIbernate Search and SOLR Integration

2012-01-20 Thread Anderson vasconcelos
Otis,
The DataImportHandler is not only for import data from database? I don't
wanna to import data from database. I just wanna to persist the object in
my database and after send this saved object to SOLR. When the user find
some document  using the SOLR search, i need to return this persistent
object (That was found in SOLR with the contents saved in database). It's
possible do this with DataImporHandler? If not possible, has other solution
or i have to make this merge in my aplication using in clause or
temporary table?

Thanks

2012/1/20 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hi Anderson,

 Not sure if you saw http://wiki.apache.org/solr/DataImportHandler


 Otis

 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html



 - Original Message -
  From: Anderson vasconcelos anderson.v...@gmail.com
  To: solr-user solr-user@lucene.apache.org
  Cc:
  Sent: Thursday, January 19, 2012 10:08 PM
  Subject: HIbernate Search and SOLR Integration
 
  Hi.
 
  It's possible to integrate Hibernate Search with SOLR? I wanna use
  Hibernate Search in my entities and use SOLR to make the work of index
  and search. Hibernate Search call SOLR to find in index and than find
  the respective objects in database. Is that possible? Exists some
  configuration for this?
 
  If it's not possible, whats the best strategy to unify the search on
  index with search in database using SOLR? Manually join of results
  from index in database query using temporary table or in clause?
 
  Thanks
 



Re: HIbernate Search and SOLR Integration

2012-01-20 Thread Anderson vasconcelos
Ok. I thought there was an easier way to do this using hibernate search. I
will make this manually.

Thanks for help



2012/1/20 Otis Gospodnetic otis_gospodne...@yahoo.com

 Hi,

 If you save all fields you want to display in search results, then you
 don't need to go to the database at search time.
 If you do not save all fields you want to display in search results, then
 you will need to first query Solr, get IDs of all matches you want to
 display, and then from your application do a SELECT with those IDs.

 DataImportHandler is for indexing data from DB and is not used at
 search-time.

 HTH

 Otis
 
 Performance Monitoring SaaS for Solr -
 http://sematext.com/spm/solr-performance-monitoring/index.html


 - Original Message -
  From: Anderson vasconcelos anderson.v...@gmail.com
  To: solr-user@lucene.apache.org; Otis Gospodnetic 
 otis_gospodne...@yahoo.com
  Cc:
  Sent: Friday, January 20, 2012 8:33 AM
  Subject: Re: HIbernate Search and SOLR Integration
 
  Otis,
  The DataImportHandler is not only for import data from database? I don't
  wanna to import data from database. I just wanna to persist the object in
  my database and after send this saved object to SOLR. When the user find
  some document  using the SOLR search, i need to return this persistent
  object (That was found in SOLR with the contents saved in database). It's
  possible do this with DataImporHandler? If not possible, has other
 solution
  or i have to make this merge in my aplication using in
  clause or
  temporary table?
 
  Thanks
 
  2012/1/20 Otis Gospodnetic otis_gospodne...@yahoo.com
 
   Hi Anderson,
 
   Not sure if you saw http://wiki.apache.org/solr/DataImportHandler
 
 
   Otis
 
   
   Performance Monitoring SaaS for Solr -
   http://sematext.com/spm/solr-performance-monitoring/index.html
 
 
 
   - Original Message -
From: Anderson vasconcelos anderson.v...@gmail.com
To: solr-user solr-user@lucene.apache.org
Cc:
Sent: Thursday, January 19, 2012 10:08 PM
Subject: HIbernate Search and SOLR Integration
   
Hi.
   
It's possible to integrate Hibernate Search with SOLR? I wanna use
Hibernate Search in my entities and use SOLR to make the work of
 index
and search. Hibernate Search call SOLR to find in index and than find
the respective objects in database. Is that possible? Exists some
configuration for this?
   
If it's not possible, whats the best strategy to unify the search
  on
index with search in database using SOLR? Manually join of results
from index in database query using temporary table or in clause?
   
Thanks
   
 
 



Phonetic search for portuguese

2012-01-20 Thread Anderson vasconcelos
Hi

The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
Caverphone) is only for english language or works for other languages? Have
some phonetic filter for portuguese? If dont have, how i can implement this?

Thanks


HIbernate Search and SOLR Integration

2012-01-19 Thread Anderson vasconcelos
Hi.

It's possible to integrate Hibernate Search with SOLR? I wanna use
Hibernate Search in my entities and use SOLR to make the work of index
and search. Hibernate Search call SOLR to find in index and than find
the respective objects in database. Is that possible? Exists some
configuration for this?

If it's not possible, whats the best strategy to unify the search on
index with search in database using SOLR? Manually join of results
from index in database query using temporary table or in clause?

Thanks


Re: Migrate Lucene 2.9 To SOLR

2011-12-15 Thread Anderson vasconcelos
OK. Thanks for help. I gonna try do migrate



2011/12/14 Chris Hostetter hossman_luc...@fucit.org


 : I have a old project that use Lucene 2.9. Its possible to use the index
 : created by lucene in SOLR? May i just copy de index to data directory of
 : SOLR, or exists some mechanism to import Lucene index?

 you can use an index created directly with lucene libraries in Solr, but
 in order for Solr to understand that index and do anything meaningful with
 it you have to configure solr with a schema.xml file that makes sense
 given the custom code used to build that index (ie: what fields did you
 store, what fields did you index, what analyzers did you use, what fields
 dod you index with term vectors, etc...)


 -Hoss



Migrate Lucene 2.9 To SOLR

2011-12-13 Thread Anderson vasconcelos
Hi

I have a old project that use Lucene 2.9. Its possible to use the index
created by lucene in SOLR? May i just copy de index to data directory of
SOLR, or exists some mechanism to import Lucene index?

Thanks


Export Index Data.

2010-11-19 Thread Anderson vasconcelos
Hi
Is possible to export one set of documents indexed in one solr server for do
a sincronization with other solr server?

Thank's


Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Hi all
When i send a delete query to SOLR, using the SOLRJ i received this
exception:

org.apache.solr.client.solrj.SolrServerException: java.net.SocketException:
Too many open files
11:53:06,964 INFO  [HttpMethodDirector] I/O exception
(java.net.SocketException) caught when processing request: Too many open
files

Anyone could Help me? How i can solve this?

Thanks


Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Thanks for responses.
I instantiate one instance of  per request (per delete query, in my case).
I have a lot of concurrency process. Reusing the same instance (to send,
delete and remove data) in solr, i will have a trouble?
My concern is if i do this, solr will commit documents with data from other
transaction.

Thanks




2010/6/28 Michel Bottan freakco...@gmail.com

 Hi Anderson,

 If you are using SolrJ, it's recommended to reuse the same instance per
 solr
 server.

 http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer

 But there are other scenarios which may cause this situation:

 1. Other application running in the same Solr JVM which doesn't close
 properly sockets or control file handlers.
 2. Open files limits configuration is low . Check your limits, read it from
 JVM process info:
 cat /proc/1234/limits (where 1234 is your process ID)

 Cheers,
 Michel Bottan


 On Mon, Jun 28, 2010 at 1:18 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  This probably means you're opening new readers without closing
  old ones. But that's just a guess. I'm guessing that this really
  has nothing to do with the delete itself, but the delete is what's
  finally pushing you over the limit.
 
  I know this has been discussed before, try searching the mail
  archive for TooManyOpenFiles and/or File Handles
 
  You could get much better information by providing more details, see:
 
 
 
 http://wiki.apache.org/solr/UsingMailingLists?highlight=(most)|(users)|(list)http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29
 
 http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29
 
 
  Best
  Erick
 
  On Mon, Jun 28, 2010 at 11:56 AM, Anderson vasconcelos 
  anderson.v...@gmail.com wrote:
 
   Hi all
   When i send a delete query to SOLR, using the SOLRJ i received this
   exception:
  
   org.apache.solr.client.solrj.SolrServerException:
  java.net.SocketException:
   Too many open files
   11:53:06,964 INFO  [HttpMethodDirector] I/O exception
   (java.net.SocketException) caught when processing request: Too many
 open
   files
  
   Anyone could Help me? How i can solve this?
  
   Thanks
  
 



Re: Too Many Open Files

2010-06-28 Thread Anderson vasconcelos
Other question,
Why SOLRJ d'ont close the StringWriter e OutputStreamWriter ?

thanks

2010/6/28 Anderson vasconcelos anderson.v...@gmail.com

 Thanks for responses.
 I instantiate one instance of  per request (per delete query, in my case).
 I have a lot of concurrency process. Reusing the same instance (to send,
 delete and remove data) in solr, i will have a trouble?
 My concern is if i do this, solr will commit documents with data from other
 transaction.

 Thanks




 2010/6/28 Michel Bottan freakco...@gmail.com

 Hi Anderson,

 If you are using SolrJ, it's recommended to reuse the same instance per
 solr
 server.

 http://wiki.apache.org/solr/Solrj#CommonsHttpSolrServer

 But there are other scenarios which may cause this situation:

 1. Other application running in the same Solr JVM which doesn't close
 properly sockets or control file handlers.
 2. Open files limits configuration is low . Check your limits, read it
 from
 JVM process info:
 cat /proc/1234/limits (where 1234 is your process ID)

 Cheers,
 Michel Bottan


 On Mon, Jun 28, 2010 at 1:18 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  This probably means you're opening new readers without closing
  old ones. But that's just a guess. I'm guessing that this really
  has nothing to do with the delete itself, but the delete is what's
  finally pushing you over the limit.
 
  I know this has been discussed before, try searching the mail
  archive for TooManyOpenFiles and/or File Handles
 
  You could get much better information by providing more details, see:
 
 
 
 http://wiki.apache.org/solr/UsingMailingLists?highlight=(most)|(users)|(list)http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29
 
 http://wiki.apache.org/solr/UsingMailingLists?highlight=%28most%29%7C%28users%29%7C%28list%29
 
 
  Best
  Erick
 
  On Mon, Jun 28, 2010 at 11:56 AM, Anderson vasconcelos 
  anderson.v...@gmail.com wrote:
 
   Hi all
   When i send a delete query to SOLR, using the SOLRJ i received this
   exception:
  
   org.apache.solr.client.solrj.SolrServerException:
  java.net.SocketException:
   Too many open files
   11:53:06,964 INFO  [HttpMethodDirector] I/O exception
   (java.net.SocketException) caught when processing request: Too many
 open
   files
  
   Anyone could Help me? How i can solve this?
  
   Thanks
  
 





Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-14 Thread Anderson vasconcelos
Thanks for the helps.
The field is just for filter my data. They are: client_id, instance_id.
When i index my data, i put de identifier of client (Because my application
is a multiclient). When i search in solr, i wanna to find the docs where
client_id:1, as example.
Put the field as string, this works. When i see that i can put the field as
long, i think that's could be a best practice. But my trouble is i have many
docs indexeds.

How to change to long now is a bad idea, i will mantain the field in string
type. (Correct me if i am wrong)

Thanks


2010/5/13 Erick Erickson erickerick...@gmail.com

 This is probably a bad idea. You're getting by on backwards
 compatibility stuff, I'd really recommend that you reindex your
 entire corpus, possibly getting by on what you already have
 until you can successfully reindex.

 Have a look at trie fields (this is detailed in the example
 schema.xml). Here's another place to look:

 http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/

 You also haven't told us what you want to do with
 the field, so making recommendations is difficult.

 Best
 Erick

 On Thu, May 13, 2010 at 5:19 PM, Anderson vasconcelos 
 anderson.v...@gmail.com wrote:

  Hi Erick.
  I put in my schema.xml fields with type string. The system go to te
  production, and now i see that the field must be a long field.
 
  When i change the fieldtype to long, show the error
  ERROR:SCHEMA-INDEX-MISMATCH when i search by solr admin.
 
  I Put plong, and this works. This is the way that i must go on? (This
  could generate a trouble in the future?)
 
  What's the advantages to set the field type to long? I must mantain this
  field in string type?
 
  Thanks
 
  2010/5/13 Erick Erickson erickerick...@gmail.com
 
   Not at present, you must re-index your documents when you redefine your
   schema
   to change existing documents.
  
   Field updating of documents already indexed is being worked on, but
 it's
   not
   available yet.
  
   Best
   Erick
  
   On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos 
   anderson.v...@gmail.com wrote:
  
Hi All.
   
I have the follow fields in my schema:
field name=uuid_field type=uuid indexed=true stored=true
default=NEW/
field name=entity_instance_id type=plong indexed=true
   stored=true
required=true/
field name=child_instance_id type=plong indexed=true
  stored=true
required=true/
field name=client_id type=plong indexed=true stored=true
required=true/
field name=indexing_date type=date default=NOW
  multiValued=false
indexed=true stored=true/
field name=field_name type=textgen indexed=true stored=true
required=false/
field name=value type=textgen indexed=true stored=false
required=false/
   
I need to change the index of SOLR, adding a dynamic field that will
contains all values of value field. Its possible to get all index
  data
and
reindex, putting the values on my dynamic field?
   
How the data was no stored, i don't find one way to do this
   
Thanks
   
  
 



Connection Pool

2010-05-14 Thread Anderson vasconcelos
Hi
I wanna to know if has any connection pool client to manage the connections
with solr. In my system, we have a lot of concurrency index request. I cant
shared my  connection, i need to create one per transaction. But if i create
one per transaction, i think the performance will down.

How you resolve this problem?

Thanks


SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Anderson vasconcelos
Hi All.

I have the follow fields in my schema:
field name=uuid_field type=uuid indexed=true stored=true
default=NEW/
field name=entity_instance_id type=plong indexed=true stored=true
required=true/
field name=child_instance_id type=plong indexed=true stored=true
required=true/
field name=client_id type=plong indexed=true stored=true
required=true/
field name=indexing_date type=date default=NOW multiValued=false
indexed=true stored=true/
field name=field_name type=textgen indexed=true stored=true
required=false/
field name=value type=textgen indexed=true stored=false
required=false/

I need to change the index of SOLR, adding a dynamic field that will
contains all values of value field. Its possible to get all index data and
reindex, putting the values on my dynamic field?

How the data was no stored, i don't find one way to do this

Thanks


SolrUser - Reindex

2010-05-13 Thread Anderson vasconcelos
Why solr/lucene no index the Character '@' ?

I send to index email fields x...@gmail.com ...and after try do search
to_email:*...@*, and not found.

I need to do some configuration?

Thanks


Re: SolrUser - ERROR:SCHEMA-INDEX-MISMATCH

2010-05-13 Thread Anderson vasconcelos
Hi Erick.
I put in my schema.xml fields with type string. The system go to te
production, and now i see that the field must be a long field.

When i change the fieldtype to long, show the error
ERROR:SCHEMA-INDEX-MISMATCH when i search by solr admin.

I Put plong, and this works. This is the way that i must go on? (This
could generate a trouble in the future?)

What's the advantages to set the field type to long? I must mantain this
field in string type?

Thanks

2010/5/13 Erick Erickson erickerick...@gmail.com

 Not at present, you must re-index your documents when you redefine your
 schema
 to change existing documents.

 Field updating of documents already indexed is being worked on, but it's
 not
 available yet.

 Best
 Erick

 On Thu, May 13, 2010 at 3:58 PM, Anderson vasconcelos 
 anderson.v...@gmail.com wrote:

  Hi All.
 
  I have the follow fields in my schema:
  field name=uuid_field type=uuid indexed=true stored=true
  default=NEW/
  field name=entity_instance_id type=plong indexed=true
 stored=true
  required=true/
  field name=child_instance_id type=plong indexed=true stored=true
  required=true/
  field name=client_id type=plong indexed=true stored=true
  required=true/
  field name=indexing_date type=date default=NOW multiValued=false
  indexed=true stored=true/
  field name=field_name type=textgen indexed=true stored=true
  required=false/
  field name=value type=textgen indexed=true stored=false
  required=false/
 
  I need to change the index of SOLR, adding a dynamic field that will
  contains all values of value field. Its possible to get all index data
  and
  reindex, putting the values on my dynamic field?
 
  How the data was no stored, i don't find one way to do this
 
  Thanks
 



Re: SolrUser - Reindex

2010-05-13 Thread Anderson vasconcelos
I'm using the textgen fieldtype on my field as follow:
fieldType name=textgen class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

.
 dynamicField name=field_value_*  type=textgenindexed=true
stored=true/

.

They no remove the @ symbol. To configure to index the @ symbol i must use
HTMLStripStandardTokenizerFactory ?

Thanks

2010/5/13 Erick Erickson erickerick...@gmail.com

 Probably your analyzer is removing the @ symbol, it's hard to say if you
 don't include the relevant parts of your schema.

 This page might help:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersBest
 Erick

 On Thu, May 13, 2010 at 3:59 PM, Anderson vasconcelos 
 anderson.v...@gmail.com wrote:

  Why solr/lucene no index the Character '@' ?
 
  I send to index email fields x...@gmail.com ...and after try do search
  to_email:*...@*, and not found.
 
  I need to do some configuration?
 
  Thanks