On Thu, Apr 19, 2012 at 9:21 PM, Jeevanandam Madanagopal
je...@myjeeva.comwrote:
Please have a look
http://wiki.apache.org/solr/DistributedSearch
-Jeevanandam
On Apr 19, 2012, at 9:14 PM, Ramprakash Ramamoorthy wrote:
Dear all,
I came across this while browsing through lucy
On Thu, Apr 19, 2012 at 3:12 PM, Sami Siren ssi...@gmail.com wrote:
I have a simple solrcloud setup from trunk with default configs; 1
shard with one replica. As few other people have reported there seems
to be some kind of leak somewhere that causes the number of open files
to grow over time
Hi,
I am designing a custom scrapping solution. I need to store my data, do some
post processing on it and then import it into SOLR.
If I want to import data into SOLR in the quickest, easiest way possible,
what format should I be saving my scrapped data in? I get the impression
that .XML would
In SolrLucene, a shard is one part of an index. There cannot be
multiple indices in one shard.
All of the shards in an index share the same schema, and no document
is in two or more shards. distributed search as implemented by solr
searches several shards in one index.
On Thu, Apr 19, 2012 at
The PolySearcher in Lucy seems to do exactly what is Distributed
Search in Solr.
On Fri, Apr 20, 2012 at 2:58 AM, Lance Norskog goks...@gmail.com wrote:
In SolrLucene, a shard is one part of an index. There cannot be
multiple indices in one shard.
All of the shards in an index share the same
James,
You could create xml files of format:
add
docfield name=id1/fieldfield
name=Name![CDATA[James]]/fieldfield
name=Surname![CDATA[Willson]]/field/doc
!-- more doc's here --
/add
and then post them to SOLR using, for example, the post.sh utility from
SOLR's binary distribution.
HTH,
Dmitry
The implementation of grouping in the trunk is completely different
from 236. Grouping works across distributed search:
https://issues.apache.org/jira/browse/SOLR-2066
committed last September.
On Thu, Apr 19, 2012 at 6:04 PM, Jean-Sebastien Vachon
jean-sebastien.vac...@wantedanalytics.com
Working with the DIH is a little easier if you make database view and
load from that. You can set all of the field names and see exactly
what the DIH gets.
On Thu, Apr 19, 2012 at 10:11 AM, Ramo Karahasan
ramo.karaha...@googlemail.com wrote:
Hi,
yes i use every oft hem.
Thanks for your
Good point! Do you store the large file in your documents, or just index them?
Do you have a largest file limit in your environment? Try this:
ulimit -a
What is the file size?
On Thu, Apr 19, 2012 at 8:04 AM, Shawn Heisey s...@elyograg.org wrote:
On 4/19/2012 7:49 AM, Bram Rongen wrote:
Thanks. My colleague also pointed a previous thread and the solution out: add a
new update.chain for data import/update handlers to bypass the distributed
update processor.
A simpler use case example for SolrCloud newbies could be on distributed
search, to experience the features of the
Hi Rahul,
Thank you for the reply. I tried by modifying the
updateRequestProcessorChain as follows:
updateRequestProcessorChain name=uima default=true
But still I am not able to see the UIMA fields in the result. I executed
the following curl command to index a file named test.docx
curl
The only way to get more elegant would be to
index the dates with the granularity you want, i.e.
truncate to DAY at index time then truncate
to DAY at query time as well.
Why do you consider ranges inelegant? How else
would you imagine it would be done?
Best
Erick
On Thu, Apr 19, 2012 at 4:07
CSV files can also be imported, which may be more
compact.
Best
Erick
On Fri, Apr 20, 2012 at 6:01 AM, Dmitry Kan dmitry@gmail.com wrote:
James,
You could create xml files of format:
add
docfield name=id1/fieldfield
name=Name![CDATA[James]]/fieldfield
Hi Jean-Sebastien,
For some grouping features (like total group count and grouped
faceting), the distributed grouping requires you to partition your
documents into the right shard. Basically groups can't cross shards.
Otherwise the group counts or grouped facet counts may not be correct.
If you
Hi,
I just wanted to make sure I understand how distributed indexing works
in solrcloud.
Can I index locally at each shard to avoid throttling a central port? Or
all the indexing has to go through a single shard leader?
thanks
Yeah, I'm indexing some PDF documents.. I've extracted the text through
tika (pre-indexing).. and the largest field in my DB is 20MB. That's quite
extensive ;) My Solution for the moment is to cut this text to the first
500KB, that should be enough for a decent index and search capabilities..
Hmm, reading your reply again I see that Solr only uses the first 10k
tokens from each field so field length should not be a problem per se.. It
could be my document contain very large tokens and unorganized tokens,
could this startle Solr?
On Fri, Apr 20, 2012 at 2:03 PM, Bram Rongen
Dear all,
Is there any way I can convert a SolrDocumentList to a DocList and
set it in the QueryResult object?
Or, the workaround adding a SolrDocumentList object to the
QueryResult object?
--
With Thanks and Regards,
Ramprakash Ramamoorthy,
Project Trainee,
Zoho Corporation.
We cannot avoid auto soft commit, since we need Lucene NRT feature. And I
use StreamingUpdateSolrServer for adding/updating index.
On Thu, Apr 19, 2012 at 7:42 AM, Boon Low boon@brightsolid.com wrote:
Hi,
Also came across this error recently, while indexing with 10 DIH
processes in
... Inelegant as opposed to the possibility of using /DAY to specify day
granularity on a single term query
In any case, if that's how SOLR works, that's fine
Any rough idea of the performance of range queries vs truncated day queries?
Otherwise, I might just write up a quick program to compare
my understanding is that you can send your updates/deletes to any
shard and they will be forwarded to the leader automatically. That
being said your leader will always be the place where the index
happens and then distributed to the other replicas.
On Fri, Apr 20, 2012 at 7:54 AM, Darren Govoni
I was able to use solr 3.1 functions to accomplish this logic:
/solr/select?q=_val_:sum(query({!dismax qf=text v='solr
rocks'}),product(map(query({!dismax qf=text v='solr
rocks'},-1),0,100,0,1), product(this_field,that_field)))
--
View this message in context:
Hi,
I want to build an index of quite a number of pdf and msword files using the
Data Import Request Handler and the Tika Entity Processor. It works very well.
Now I would like to use the md5 digest of the binary (pdf/word) file as the
unique key in t
he index. But I do not know how to
I have removed most of the file to protect the innocent. As you can see we
have a high level item that has subentity called skus, and then those skus
contain subentities for size/width/etc. The database is configured for only 10
open cursors, and voila, when the 11th item is being processed
Yeah, this is a pretty ugly problem. You have two
problems, neither of which is all that amenable to
simple solutions.
1 context at index time. St, in your example, is
either Saint or Street. Solr has nothing built
in to it to distinguish this. so you need to do some
processing
Right, this is often a source of confusion and there's a discussion about
this on the dev list (but the URL escapes me)..
Anyway, qt and defType have pretty much completely different meanings.
Saying defType=dismax means you're providing all the dismax
parameters on the URL.
Saying
I have to discard this method at this time. Thank you all the same.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Further-questions-about-behavior-in-ReversedWildcardFilterFactory-tp3905416p3926423.html
Sent from the Solr - User mailing list archive at Nabble.com.
BTW, nice problem statement...
Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got fixed
in the 3.6 time-frame. Don't have the time right now
to go back over the JIRA's to see...
Best
Erick
On Thu, Apr 19, 2012 at 3:39 PM, Cat Bieber
Hi,
I'm having issues with special characters in synonyms.txt on Solr 3.5.
I'm running a multi-lingual index and need certain terms to give results across
all languages no matter what language the user uses.
I figured that this should be easily resovled by just adding the different
words to
On Fri, Apr 20, 2012 at 12:10 PM, carl.nordenf...@bwinparty.com
carl.nordenf...@bwinparty.com wrote:
Directly injecting the letter ö into synonyms like so:
island, ön
island, ön
renders the following exception on startup (both lines renders the same
error):
java.lang.RuntimeException:
Actually I would like to know two meaning of the top term in document level
and index file level.
1.The top term in document level means that I would like to know the top
term frequency in all document(only calculate once in one document)
The solr schema.jsp seems to provide to top 10 term, but
Thanks for looking at this. I'll see if we can sneak an upgrade to 3.6
into the project to get this working.
-Cat
On 04/20/2012 12:03 PM, Erick Erickson wrote:
BTW, nice problem statement...
Anyway, I see this too in 3.5. I do NOT see
this in 3.6 or trunk, so it looks like a bug that got
Gotcha.
Now does that mean if I have 5 threads all writing to a local shard,
will that shard piggyhop those index requests onto a SINGLE connection
to the leader? Or will they spawn 5 connections from the shard to the
leader? I really hope the formerthe latter won't scale well.
On Fri,
Hello everyone,
I'm in the process of pulling together requirements for a SCM (source code
manager) crawling mechanism for our Solr index. I probably don't need to argue
the need for a crawler, but to be specific, we have an index which receives its
updates from a custom built application. I
I'm working on using Shuyo's work to improve the language identification of
our search. Apparently, it's been moved from Nutch to Solr. Is there a
reason for this?
http://code.google.com/p/language-detection/issues/detail?id=34
I would prefer to have the processing done in Nutch as that has
Thanks Jeevanandam. I couldn't get any regex pattern to work except a basic
one to look for sentence-ending punctuation followed by whitespace:
[.!?](?=\s)
However, this isn't good enough for my needs so I'm switching tactics at the
moment and working on plugging in OpenNLP's SentenceDetector
I believe the SolrJ code round robins which server the request is sent
to and as such probably wouldn't send to the same server in your case,
but if you had an HttpSolrServer for instance and were pointing to
only one particular intsance my guess would be that would be 5
separate requests from the
Hi,
Solr just reuses Tika's language identifier. But you are of course free to do
your language detection on the Nutch side if you choose and not invoke the one
in Solr.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com
On 20. apr.
You could run the MLT for the document in question, then gather all
those doc id's in the MLT results and negate those in a subsequent
query. Not sure how robust that would work with very large result sets,
but something to try.
Another approach would be to gather the interesting terms from the
Hello,
I have been trying out deduplication in solr by following:
http://wiki.apache.org/solr/Deduplication. I have defined a signature field
to hold the values of the signature created based on few other fields in a
document and the idea seems to work like a charm in a single solr instance.
But,
0
down vote
favorite
share [g+]
share [fb]
share [tw]
I'm trying to index a few pdf documents using SolrJ as described at
http://wiki.apache.org/solr/ContentStreamUpdateRequestExample, below there's
the code:
import static
org.apache.solr.handler.extraction.ExtractingParams.LITERALS_PREFIX;
OK, this description really sounds like an XY problem. Why do you
want to do this? What is the higher-level problem you're trying to solve?
Best
Erick
On Fri, Apr 20, 2012 at 9:18 AM, Ramprakash Ramamoorthy
youngestachie...@gmail.com wrote:
Dear all,
Is there any way I can convert a
This might help:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/
The bit here is you have to have Tika parse your file
and then extract the content to send to Solr...
Best
Erick
On Fri, Apr 20, 2012 at 7:36 PM, vasuj vasu.j...@live.in wrote:
0
down vote
favorite
share
Kristian,
For what it's worth, for http://search-lucene.com and http://search-hadoop.com
we simply check out the source code from the SCM and index from the file
system. It works reasonably well. The only issues that I can recall us having
is with the source code organization under SCM -
Hi Joe,
You could write a custom URP - Update Request Processor. This URP would take
the value from one SolrDocument field (say the one that has the full path to
your PDF and is thus unique), compute MD5 using Java API for doing that, and
would stick that MD5 value in some field that you've
We are loading a long (number of seconds since 1970?) value into Solr using
java and Solrj. What is the best way to convert this into the right Solr date
fields?
Sent from my Mobile device
720-256-8076
On 21 April 2012 09:12, Bill Bell billnb...@gmail.com wrote:
We are loading a long (number of seconds since 1970?) value into Solr using
java and Solrj. What is the best way to convert this into the right Solr date
fields?
[...]
There are various options, depending on the source of
your
47 matches
Mail list logo