Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Nitin Solanki
Anyone has answer of question which I have asked on 20th Jan 2015 at 7:48 PM On Tue, Jan 20, 2015 at 11:59 PM, Nitin Solanki wrote: > Okay. No Problem. Please somebody check my question which I have mailed on > 20th Jan 2015 at 7:48 PM where I have posted my question along with 2 > attachments.

RE: shards per disk

2015-01-20 Thread Nimrod Cohen
Hi Toke, Thanks for your answer. We are using RAID 0 of 8 disk, I don't understand why it should give me the same performance as disk per drive. Below is an explanation as I see it please correct me if I'm wrong. RAID configuration each shard has data on each one of the 8 disks in the RAID,

Re: How to index data from multiple data source

2015-01-20 Thread Alvaro Cabrerizo
Hi, You can find several examples of configuring tika+dih to index pdf in internet (e.g. https://tuxdna.wordpress.com/2013/02/04/indexing-the-documents-stored-in-a-database-using-apache-solr-and-apache-tika/ ) Regards. On Jan 21, 2015 6:54 AM, "Yusniel Hidalgo Delgado" wrote: > > > Dear Solr co

AW: AW: transactions@Solr(J)

2015-01-20 Thread Clemens Wyss DEV
But then what happens if: Autocommit is set to 10 docs and I add 11 docs and then decide (due to an exception?) to rollback. Will only one (i.e. the last added) document be rollen back? -Ursprüngliche Nachricht- Von: Michael Sokolov [mailto:msoko...@safaribooksonline.com] Gesendet: Dien

How to index data from multiple data source

2015-01-20 Thread Yusniel Hidalgo Delgado
Dear Solr community, I am diving into Solr recently and I need help in the following usage scenery. I am working on a project for extract and search bibliographic metadata from PDF files. Firstly, my PDF files are processed to extract bibliographic metadata such as title, authors, affilia

Re: How to make edge_ngram work with number, underscores, dashes and space

2015-01-20 Thread Alexandre Rafalovitch
So, try the suggested tokenizers and dump the ngrams from query. See what happens. Ask a separate question with corrected config/output if you still have issues. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 20 January 2015 at 23:08, Vishal Swar

Re: How to make edge_ngram work with number, underscores, dashes and space

2015-01-20 Thread Vishal Swaroop
Thanks for the response.. a) I am trying to make it non-case-sensitive... itemName data is indexed in upper case b) I am looking to display the result as type-ahead suggestion which might include space, underscore, number... - "ABC12DE" : It does not work as soon as I type 1.. i.e. ABC1 Output ex

Index data of XML file

2015-01-20 Thread Quen Aki
Hi I'm using apache solr 4.9.0 and manifoldcf 1.6.1. I can't generate index of XML files including tags and attributes. Is it possible to achieve those by the set value of schema.xml or solrconfig.xml? Can any one help me? Regards, Aki

Re: How to make edge_ngram work with number, underscores, dashes and space

2015-01-20 Thread Alexandre Rafalovitch
Were you actually trying to "...divides text at non-letters and converts them to lower case"? Or were you trying to make it non-case-sensitive, which would be KeywordTokenizer and LowerCaseFilter? Also, normally we do not use NGRam filter on both Index and Query. That just makes things to match on

How to make edge_ngram work with number, underscores, dashes and space

2015-01-20 Thread Vishal Swaroop
Hi, May be this is basic but I am trying to understand which Tokenizer and Filter to use. I followed some examples as mentioned in solr wiki but type-ahead does not show expected suggestions. Example itemName data can be : - "ABC12DE" : It does not work as soon as I type 1.. i.e. ABC1 - "ABC_12DE

Re: Solr Users Mailing lists in languages other than English?

2015-01-20 Thread Alexandre Rafalovitch
I am cool with that. Just wanted to check that there was not one hiding around. Also, ElasticSearch has a couple of language-specific groups and at least Russian one gets some traffic every couple of weeks or so. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-s

Re: Solr DIH using JDBC with TIKA

2015-01-20 Thread dboychuck
Got it working with the updated config: -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-using-JDBC-with-TIKA-tp4180737p4180742.html Sent from the S

Solr DIH using JDBC with TIKA

2015-01-20 Thread dboychuck
I'm trying to index certain data from a table and documents located on disk using jdbc and tika. I can derive the file locations from the table and using that data I want to also import documents into Solr. However I'm having trouble with my configuration.

Re: Newly observed Facets

2015-01-20 Thread harish singh
Thanks Alvaro. That worked. On Tue, Jan 20, 2015 at 9:59 AM, harish singh wrote: > ok. So I am trying this query: > > > http://cluster1.com:8983/solr/my_collection_shard4_replica1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=userName&fq=startTimeISO:[NOW-1DAY%20TO%20NOW]&fq=-

Re: Filter Solr multivalued fields to be able to add pagination

2015-01-20 Thread Alvaro Cabrerizo
Hi, Currently, there is no way to sort by a multi-value field within solr (first the system should sort the content of the field, then sort documents...). Anyway, if you have a clear idea on how the sort should be done try to accomodate your data to your needs (in case it is posible). One option

Re: How to return custom collector info

2015-01-20 Thread tedsolr
Joel, Thank you for the links. The AnalyticsQuery is just the thing I need to return custom stats in the response. What I'm struggling with now, is how to read the doc field values. I've been following the CollapsingQParserPlugin model of accessing the field cache in the Query class getAnalyticsC

Re: Connection Reset Errors with Solr 4.4

2015-01-20 Thread Nishanth S
Thank you Mike.Sure enough,we are running into the same issue you mentoined.Is there a quick fix for this other than the patch.I do not see the tlogs getting replayed at all.It is doing a full index recovery from the leader and our index size is around 200G.Would lowering the autocommit settings he

Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Nitin Solanki
Okay. No Problem. Please somebody check my question which I have mailed on 20th Jan 2015 at 7:48 PM where I have posted my question along with 2 attachments. I am also waiting for Shalin, if he is able to answer. On Tue, Jan 20, 2015 at 11:49 PM, Shawn Heisey wrote: > On 1/20/2015 11:11 AM, Niti

Re: Leaders in Recovery Failed state

2015-01-20 Thread Nitin Solanki
I am also facing the same issue. My solr version is 4.10.2 On Tue, Jan 20, 2015 at 11:33 PM, Erick Erickson wrote: > What version of Solr? > > > On Tue, Jan 20, 2015 at 7:07 AM, anand.mahajan > wrote: > > Hi all, > > > > > > I have a cluster with 36 Shards and 3 replica per shard. I had to > re

Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Shawn Heisey
On 1/20/2015 11:11 AM, Nitin Solanki wrote: > Thanks a lot Shawn. There is any way to reduce time to retrieve suggestions > fast. I know almost nothing about how to use the suggester and spellcheck features of Solr. I do know that the suggester is based on spellcheck. I have a spellcheck config

Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Nitin Solanki
Thanks a lot Shawn. There is any way to reduce time to retrieve suggestions fast. On Tue, Jan 20, 2015 at 9:33 PM, Shawn Heisey wrote: > On 1/20/2015 7:18 AM, Nitin Solanki wrote: > > Thanks and sorry for Stackoverflow. You are saying that use "string" > > type. But I have used filter = solr.Shi

Re: Leaders in Recovery Failed state

2015-01-20 Thread Erick Erickson
What version of Solr? On Tue, Jan 20, 2015 at 7:07 AM, anand.mahajan wrote: > Hi all, > > > I have a cluster with 36 Shards and 3 replica per shard. I had to recently > restart the entire cluster - most of the shards & replica are back up - but > a few shards have not had any leaders for a long

Re: Newly observed Facets

2015-01-20 Thread harish singh
ok. So I am trying this query: http://cluster1.com:8983/solr/my_collection_shard4_replica1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=userName&fq=startTimeISO:[NOW-1DAY%20TO%20NOW]&fq=-_query_:%22{!join%20from=% userName%20to=%userName}startTimeISO:[NOW-30DAYS%20TO%20NOW-1DA

Re: shards per disk

2015-01-20 Thread Roman Chyla
I think this makes sense to (ie. the setup), since the search is getting 1K documents each time (for textual analysis, ie. they are probably large docs), and use Solr as a storage (which is totally fine) then the parallel multiple drive i/o shards speed things up. The index is probably large, so it

Re: Newly observed Facets

2015-01-20 Thread Alvaro Cabrerizo
Hi, In case your data looks like: "id": "1", "userName": "one", "startTimeISO": "2015-01-20T17:24:32.888Z" "id": "2", "userName": "one", "startTimeISO": "2015-01-16T17:24:50.208Z" "id": "3", "userName": "two", "startTimeISO": "2015-01-20T17:25:06.109Z" You could use the next query combination

Re: Connection Reset Errors with Solr 4.4

2015-01-20 Thread Mike Drob
Are we sure this isn't SOLR-6931? On Tue, Jan 20, 2015 at 11:39 AM, Nishanth S wrote: > Hello All, > > We are running solr cloud 4.4 with 30 shards and 3 replicas with real time > indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running into an > issue with replicas going into recover

RE: shards per disk

2015-01-20 Thread Toke Eskildsen
Nimrod Cohen [nimrod.co...@nice.com] wrote: > We need to get 1K documents out of 100M documents each > time we query solr and send them to text Analysis. > First configuration had 8 shards on one RAD (Disk F) we > got the 1K in around 15 seconds. > Second configuration we removed the RAD and work o

Connection Reset Errors with Solr 4.4

2015-01-20 Thread Nishanth S
Hello All, We are running solr cloud 4.4 with 30 shards and 3 replicas with real time indexing on rhel 6.5.The indexing rate is 3K Tps now.We are running into an issue with replicas going into recovery mode due to connection reset errors.Soft commit time is 2 min and auto commit is set as 5 minut

Filter Solr multivalued fields to be able to add pagination

2015-01-20 Thread PeterKerk
I have the Solr XML response below using this query: http://localhost:8983/solr/tt/select/?indent=off&facet=false&wt=xml&fl=title,overallscore,service,reviewdate&q=*:*&fq=id:315&start=0&rows=4&sort=reviewdate%20desc I want to add paging on the multivalued fields, but the above query throws the err

Re: Order synonyms

2015-01-20 Thread Antoine REBOUL
Bonjour et merci pour votre réponse, (désolé pour mon anglais, j'utilise un traducteur) j'ai essayé d'utiliser le fichier elevate.xml en y ajoutant : où 271 est l'identifiant unique du marchand Apple. Afin d'essayer de le faire prendre en compte je passe les paramètres suivants : &force

MultiPhraseQuery:Rewrite to BooleanQuery

2015-01-20 Thread ku3ia
Hi folks! I have a multiphrase query, for example, from units: Directory indexStore = newDirectory(); RandomIndexWriter writer = new RandomIndexWriter(random(), indexStore); add("blueberry chocolate pie", writer); add("blueberry chocolate tart", writer); IndexReader r = writer.getReader();

Re: Order synonyms

2015-01-20 Thread Aurélien MAZOYER
Hi, I am afraid you don't use the right component. In your example, you will match "apple", "darty "and "boulanger" documents, sorted by the default Solr scoring mechanism (TF-IDF) that won't take the order you specified in your synonyms.txt file into account for the scoring. If you want to o

PostingsHighlighter highlighted snippet size (fragsize)

2015-01-20 Thread Zisis Tachtsidis
Hi all, I'm using SolrCloud 4.10.0 and trying to incorporate PostingsSolrHighlighter. One issue that I'm having is that I cannot have the functionality of "hl.fragsize" in PostingsSolrHighlighter. How can I limit the size of the highlighted text? I get highlighted results but their snippet size va

Re: Newly observed Facets

2015-01-20 Thread harish singh
Well, that is the problem I am facing. Just checking if there is a way to compute the diff from 18th for the 19th. One option is: Get all the facets for 19th. Get all facets for 18th. Do a diff and Eliminate intersection. But this isn't optimal as the number of facets returned but solr query can b

Order synonyms

2015-01-20 Thread Antoine REBOUL
Hello, (sorry for my English , I use a translator) I used synonyms in solr . My question is the following: How to order the results list according to the order of synonyms ? My synonyms are written as follows in mysynonyms.txt file : ipad = > apple , Darty , Boulanger I want that when you sea

Re: shards per disk

2015-01-20 Thread Shawn Heisey
On 1/20/2015 7:45 AM, Nimrod Cohen wrote: > All shards are on the same system each one use different port. > BTW > Data size is about 1T, memory is 192G. If Solr has to actually go to the disk to satisfy a query, it's going to be slow. This will always be true, no matter how many disks you use.

Re: Newly observed Facets

2015-01-20 Thread Shawn Heisey
On 1/20/2015 8:52 AM, harish singh wrote: > Yes I got that. But I am still stuck at this point. Consider it like this: > I do not know what are the usernames in all the documents. > I only know there is time associated with each record. > > So Say, I have usernames "a", "b", "c", "d" present in my

Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Shawn Heisey
On 1/20/2015 7:18 AM, Nitin Solanki wrote: > Thanks and sorry for Stackoverflow. You are saying that use "string" > type. But I have used filter = solr.ShingleFilterFactory to break a > string into ngrams. > I want to build query correction just like google is doing - "Did you > mean". Shalin is s

Re: Newly observed Facets

2015-01-20 Thread harish singh
Yes I got that. But I am still stuck at this point. Consider it like this: I do not know what are the usernames in all the documents. I only know there is time associated with each record. So Say, I have usernames "a", "b", "c", "d" present in my data for the 18th of January. And for the 19th, I h

Re: AW: transactions@Solr(J)

2015-01-20 Thread Michael Sokolov
Yes -- autoCommit works just the same as if you had a timer in your app committing. You have to turn it off if you want to maintain the ability to roll back predictably. -Mike On 01/20/2015 09:19 AM, Clemens Wyss DEV wrote: Thanks Mike, but a key difference is that when one client commits,

Leaders in Recovery Failed state

2015-01-20 Thread anand.mahajan
Hi all, I have a cluster with 36 Shards and 3 replica per shard. I had to recently restart the entire cluster - most of the shards & replica are back up - but a few shards have not had any leaders for a long long time (close to 18 hours now) - I tried reloading these cores and even the servlet co

Leaders in Recovery Failed state

2015-01-20 Thread anand.mahajan
Hi all,I have a cluster with 36 Shards and 3 replica per shard. I had to recently restart the entire cluster - most of the shards & replica are back up - but a few shards have not had any leaders for a long long time (close to 18 hours now) - I tried reloading these cores and even the servlet conta

RE: shards per disk

2015-01-20 Thread Nimrod Cohen
Hi All shards are on the same system each one use different port. BTW Data size is about 1T, memory is 192G. NIMROD COHEN  Software Engineer  RTI (T) +972 (9) 775-3668 (M) +972 (0) 52-5522901 nimrod.co...@nice.com  www.nice.com   -Original Message- From: Nitin Solanki [mailto:nitinml...@

Re: Using SolrCloud to implement a kind of federated search

2015-01-20 Thread Jürgen Wagner (DVT)
Hello Charlie, theoretically, things may work as you describe them. A few big HOWEVERs exist as far as I can see: 1. Attributes: as different organisations may use different schemata (document attributes), the consolidation of results from multiple sources may present a problem. This may not ari

Re: shards per disk

2015-01-20 Thread Nitin Solanki
Hey Nimrod, Nice try. I just want to know that these 8 shards are each on different system or do you implemented sharding on single system and each shard with different port? On Tue, Jan 20, 2015 at 7:54 PM, Nimrod Cohen wrote: > Hi > > I done some performance test, and I wanted to know if any o

Re: shards per disk

2015-01-20 Thread Jack Krupansky
It sounds like your app needs a lot more RAM so that it is not doing so much I/O. -- Jack Krupansky On Tue, Jan 20, 2015 at 9:24 AM, Nimrod Cohen wrote: > Hi > > I done some performance test, and I wanted to know if any one saw the same > behavior. > > > > We need to get 1K documents out of 100

shards per disk

2015-01-20 Thread Nimrod Cohen
Hi I done some performance test, and I wanted to know if any one saw the same behavior. We need to get 1K documents out of 100M documents each time we query solr and send them to text Analysis. First configuration had 8 shards on one RAD (Disk F) we got the 1K in around 15 seconds. Second conf

Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Nitin Solanki
Thanks and sorry for Stackoverflow. You are saying that use "string" type. But I have used filter = solr.ShingleFilterFactory to break a string into ngrams. I want to build query correction just like google is doing - "Did you mean". i) I am storing ngrams into gram field and have only single this

AW: transactions@Solr(J)

2015-01-20 Thread Clemens Wyss DEV
Thanks Mike, > but a key difference is that when one client commits, all clients will see > the updates That's ok. What about the -setting(s) in solrconfig.xml. Doesn't this mean that after adding x elements (or after a certain timeframe), the changes are commited and hence no more rollbackable

Using SolrCloud to implement a kind of federated search

2015-01-20 Thread Charlie Hull
Hi all, We've been discussing a way of implementing a federated search by leveraging the distributed query parts of SolrCloud. I've written this up at http://www.flax.co.uk/blog/2015/01/20/solr-superclusters-for-improved-federated-search/ and would welcome any comments or feedback. So far, two com

Re: transactions@Solr(J)

2015-01-20 Thread Michael Sokolov
On 1/20/2015 5:18 AM, Clemens Wyss DEV wrote: http://stackoverflow.com/questions/10805117/solr-transaction-management-using-solrj Is it true, that a SolrServer-instance denotes a "transaction context"? Say I have two concurrent threads, each having a SolrServer-instance "pointing" to the same c

AW: AW: AW: TermsComonent, buildOnCommit?

2015-01-20 Thread Clemens Wyss DEV
Great! -Ursprüngliche Nachricht- Von: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Gesendet: Dienstag, 20. Januar 2015 13:25 An: solr-user@lucene.apache.org Betreff: Re: AW: AW: TermsComonent, buildOnCommit? Hi Clemes, Please see https://issues.apache.org/jira/browse/SOLR-1487 for a

Re: AW: AW: TermsComonent, buildOnCommit?

2015-01-20 Thread Ahmet Arslan
Hi Clemes, Please see https://issues.apache.org/jira/browse/SOLR-1487 for a solrJ workaround. Ahmet On Tuesday, January 20, 2015 2:22 PM, Clemens Wyss DEV wrote: Thx, but sorry for asking: what is the SolrJ corresponding command? SolrServer#commit() SolrServer# commit( boolean waitFlush, b

AW: AW: TermsComonent, buildOnCommit?

2015-01-20 Thread Clemens Wyss DEV
Thx, but sorry for asking: what is the SolrJ corresponding command? SolrServer#commit() SolrServer# commit( boolean waitFlush, boolean waitSearcher ) SolrServer# commit( boolean waitFlush, boolean waitSearcher, boolean softCommit ) -Ursprüngliche Nachricht- Von: Ahmet Arslan [mailto:iori

Re: AW: TermsComonent, buildOnCommit?

2015-01-20 Thread Ahmet Arslan
Hi, curl http://localhost:8983/solr/core/update?commit=true&expungeDeletes=true ahmet On Tuesday, January 20, 2015 1:51 PM, Clemens Wyss DEV wrote: > Deleted terms could confuse you they do ;) >commit with expunge deletes How is this done? -Ursprüngliche Nachricht- Von: Ahmet Arsla

Re: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Shalin Shekhar Mangar
I already replied to you on stack overflow but your response there and the schema.xml definition here are contrary to each other. You are using a textSpell field which is tokenized as a unique key. As I mentioned on stack overflow, it is a bad idea. Yes, it will impact performance as well as lead

Fwd: Issue : Replacing ID with another will degrade performance in Solr?

2015-01-20 Thread Nitin Solanki
Hi, I am working on solr 4.10.2. I have been trapped into the *performance issue* where I have indexed 600MB data on 4 shards with single replicas each. I have defined 2 fields (ngram and frequency). I have removed ID field and replaced it with ngram field. Therefore, Search perfor

AW: TermsComonent, buildOnCommit?

2015-01-20 Thread Clemens Wyss DEV
> Deleted terms could confuse you they do ;) >commit with expunge deletes How is this done? -Ursprüngliche Nachricht- Von: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] Gesendet: Dienstag, 20. Januar 2015 12:22 An: solr-user@lucene.apache.org Betreff: Re: TermsComonent, buildOnCommit?

Re: TermsComonent, buildOnCommit?

2015-01-20 Thread Ahmet Arslan
Hi, Deleted terms could confuse you. commit with expunge deletes or optimise will purge deleted terms. Ahmet On Tuesday, January 20, 2015 1:03 PM, Clemens Wyss DEV wrote: Does the TermsComponent (/terms) have something like buildOnCommit ? Or is it always up-to-date (<- my unit tests deny t

TermsComonent, buildOnCommit?

2015-01-20 Thread Clemens Wyss DEV
Does the TermsComponent (/terms) have something like buildOnCommit ? Or is it always up-to-date (<- my unit tests deny this)?

transactions@Solr(J)

2015-01-20 Thread Clemens Wyss DEV
http://stackoverflow.com/questions/10805117/solr-transaction-management-using-solrj Is it true, that a SolrServer-instance denotes a "transaction context"? Say I have two concurrent threads, each having a SolrServer-instance "pointing" to the same core. Then each thread can add/update/delete do

Re: Newly observed Facets

2015-01-20 Thread Alvaro Cabrerizo
Hi Harish, What I was requesting you in my previous mail was to try (yourself) to understand your data using specific queries. Apart from that, remember that facet is doing over indexed data thus if you have two documents with nameA as "user A" and nameB as "user B", and they are tokenized you

Re: Newly observed Facets

2015-01-20 Thread harish singh
I am not querying for a specific usernames. Each day, there will be many usernames observed at different times. But there might be some usernames that were never seen in the last 30 days, but they were observed today. That is the main challenge I am having. How to identify which usernames from tod

Re: Newly observed Facets

2015-01-20 Thread Alvaro Cabrerizo
Ok, Thus as commented before, in case your starttimeISO is single-value you only need to add the range clause: startTimeISO:["2015-01-19T00: 00:00.000Z" TO "2015-01-20T00:00:00.000Z"]". There is no need to add both NOT A AND B as the documents that satisfy B will automatically satisfy A. If you q

Re: Improved suggester question

2015-01-20 Thread Dinu Suman
Maybe this is because of the "<" sign. Encode it and try again. len must be <= 32767; got 35680 On Tue, Jan 13, 2015 at 7:51 PM, Dan Davis wrote: > The suggester is not working for me with Solr 4.10.2 > > Can anyone shed light over why I might be getting the exception below when > I build the

Re: Newly observed Facets

2015-01-20 Thread harish singh
Every entry in the document has a username, starttimeISO and uuid (which is not starttimeiso) So every record has a starttimeISO which is the time when the username was seen. The document looks like this: { Uuid: xxx StartTimeISO: 2015-01-18T00:00:00.000Z Username: abc } There are multiple recor

Re: Newly observed Facets

2015-01-20 Thread Alvaro Cabrerizo
is startTimeISO single or multi-valued? (In other words, do you add a new value to startTimeISO everytime a document is observed or just the first time?) The idea is to clarify the data you store in the index. So how does your document look like: id:XXX-YYY-ZZZ name: theName startTimeISO:[2015-01