We see similar results, again we softCommit every 1s (trying to get as NRT
as we can), and we very rarely get any hits in our caches. As an
unscheduled test last week, we did shutdown indexing and noticed about 80%
hit rate in caches (and average query time dropped from ~1s to 100ms!) so I
think
Try setting dataSource=null for your toplevel entity and
use filename=\.zip$ as filename selector.
Am 28.06.2013 23:14, schrieb ericrs22:
unfortunately not. I had tried that before with the logs saying:
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler.
Im using Solr-4.3.0. I followed the steps given in this post
http://lucene.472066.n3.nabble.com/indexing-with-DIH-and-with-problems-td3731129.html
However, I get the following error -
Full Import
The tika jars are not in your classpath. You need to add all the jars
inside contrib/extraction/lib directory to your classpath.
On Mon, Jul 1, 2013 at 2:00 PM, archit2112 archit2...@gmail.com wrote:
Hi I'm new to Solr. I want to index pdf files usng the Data Import Handler.
Im using
Hi Erick,
Thanks for the reply.
Here is what the situation is:
Relevant portion of Solr Schema:
lt;field name=Content type=text_general indexed=false stored=true
required=true/gt;
lt;field name=ContentSearch type=text_general indexed=true
stored=false multiValued=true/gt;
lt;field
Hello together,
we are currently working on a mutilanguage single core setup.
During that I stumbled upon the question if it is possible to define different
sources for the spellcheck.
For now I only see the possibility to define different request handlers. Is it
somehow possible to set
the
Hi,
We have a need of finding the sum of a field for each facet.query. We have
looked at StatsComponent http://wiki.apache.org/solr/StatsComponent but
that supports only facet.field. Has anyone written a patch over
StatsComponent that supports the same along with some performance measures?
Is
Hello friends,
I have a schema which contains various types of records of three different
categories for ease of management and for making a single query to fetch all
the data. The fields are grouped into three different types of records. For
example:
fields type 1:
field name=x_date type=tdate
My entire concern is to be able to make a single query to fetch all the types
of records. If I had to create three different cores for this different
types of data, I would have to make 3 calls to solr to fetch the entire set
of data. And I will be having approx 15 such types in real.
Also, at
Hi
Thanks a lot. I did what you said. Now I'm getting the following error.
Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
java.util.regex.PatternSyntaxException: Dangling meta character '*' near
index 0
--
View this message in context:
Check out http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.dictionary
- you can define multiple dictionaries in the same handler, each with its own
source field.
--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
1. juli 2013 kl. 11:34 skrev Timo Schmidt
Daniel:
Soft commits invalidate the top level caches, which include
things like filterCache, queryResultCache etc. Various
segment-level caches are NOT invalidated, but you really
don't have a lot of control from the Solr level over those
anyway.
But yeah, the tension between caching a bunch of
OK, have you done anything custom? You get
this where? solr logs? Echoed back in the browser?
In response to what command?
You haven't provided enough info to help us help you.
You might review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Mon, Jul 1, 2013 at 6:08 AM, archit2112
I figured it out. It was a problem with the regular expression i used in
data-config.xml .
--
View this message in context:
http://lucene.472066.n3.nabble.com/Index-pdf-files-tp4074278p4074304.html
Sent from the Solr - User mailing list archive at Nabble.com.
bq: But looks like it is executing the search for an exact text based
match with the stem burn.
Right. You need to appreciate index time as opposed to query time stemming.
Your field
definition has both turned on. The admin/analysis page will help here G..
At index time, the terms are stemmed,
Have you tried the query you indicated? Because it should
just work barring syntax errors. The only other thing you
might want is to turn on grouping by field type. That'll
return separate sections by type, say the top 3 (default 1)
documents in each type. If you don't group, you have the
Hi,
When doing distributed searches with shards.tolerant set whilst the hosts for a
slice are down and therefore the response is partial, how best that inferred as
we would like to not cache the results upstream and perhaps inform the end user
in some way.
I am aware that shards.info could be
Hi
Im trying to index pdf files in solr 4.3.0 using the data import handler.
*My request handler - *
requestHandler name=/dataimport1
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdata-config1.xml/str
/lst
/requestHandler
*My
It all depends on your data model - tell us more about your data model.
For example, how will users or applications query these documents and what
will they expect to be able to do with the ID/key for the documents?
How are you expecting to identify documents in your data model?
-- Jack
Im new to solr. Im just trying to understand and explore various features
offered by solr and their implementations. I would be very grateful if you
could solve my problem with any example of your choice. I just want to learn
how i can index pdf documents using data import handler.
--
View this
Hey, i have tried to make use of the UniqFieldsUpdateProcessorFactory in
order to achieve distinct values in multivalued fields. Example below:
updateRequestProcessorChain name=uniq_fields
processor
class=org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory
lst name=fields
It's really 100% up to you how you want to come up with the unique key
values for your documents. What would you like them to be? Just use that.
Anything (within reason) - anything goes.
But it also comes back to your data model. You absolutely must come up with
a data model for how you
So the general solution is to index the field twice, once with stemming and
once without in order to have the ability to do both stemmed and exact matches
I am already indexing the text twice using the ContentSearch and
ContentSearchStemming fields. But what this allows me is to return
Your stated problem seems to have nothing to do with the message subject
line relating to RemoveDuplicatesTokenFilterFactory. Please start a new
message thread unless you really are concerned with an issue related to
RemoveDuplicatesTokenFilterFactory.
This kind of thread hijacking is
I was just wondering if another solution might work. If we are able to extract
the stem of the input search term(maybe using a C# based stemmer, some open
source implementation of the Porter algorithm) for cases where the stemming
option is selected, and submit the query to solr as a multiple
On Jul 1, 2013, at 6:56 AM, Phil Hoy p...@brightsolid.com wrote:
Perhaps an http header could be added or another attribute added to the solr
result node.
I thought that was already done - I'm surprised that it's not. If that's really
the case, please make a JIRA issue.
- Mark
Hello everybody,
i have tried to make use of the UniqFieldsUpdateProcessorFactory in
order to achieve distinct values in multivalued fields. Example below:
updateRequestProcessorChain name=uniq_fields
processor
class=org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory
Hi,
I have the following data model:
1. Document (fields: doc_id, author, content)
2. Each Document has multiple attachment types. Each attachment type has
multiple instances. And each attachment type may have different fields.
for example:
doc
doc_id1/doc_id
authorjohn/author
Have a look at the DedupUpdateProcessorFactory, which may help you.
Although, I'm not sure if it works with multivalued fields.
Upayavira
On Mon, Jul 1, 2013, at 02:34 PM, tuedel wrote:
Hello everybody,
i have tried to make use of the UniqFieldsUpdateProcessorFactory in
order to achieve
Simply duplicate a subset of the fields that you want to query of the parent
document on each child document and then you can directly query the child
documents without any join.
Yes, given the complexity of your data, a two-step query process may be
necessary for some queries - do one query
Unfortunately, update processors only see the new, fresh, incoming data,
not any existing document data.
This is a case where your best bet may be to read the document first and
then merge your new value into the existing list of values.
-- Jack Krupansky
-Original Message-
From:
Hi,
I am using Solr 4.3.0.
If I change my solr's schema.xml then do I need to re-index my solr ? And
if yes , how to ?
My 2nd question is I need to find the frequency of term per document in all
documents of search result.
My field is
field name=CommentX type=text_general stored=true
Hi,
BlockUntilFinish block indefinitely sometimes. But if I send a commit from
another thread to the instance, the concurrentUpdateServer unblock and send
the rest of the documents and commit. So the squence look like this:
1. adding documents as usual...
2. finish adding documents...
3. block
You can write any function query in the field list of the fl parameter.
Sounds like you want termfreq:
termfreq(field_arg,term)
fl=id,a,b,c,termfreq(a,xyz)
-- Jack Krupansky
-Original Message-
From: Tony Mullins
Sent: Monday, July 01, 2013 10:47 AM
To: solr-user@lucene.apache.org
Hi,
I have recently upgraded from Solr 3.5 to 4.2.1.
Also we have added spellcheck feature to our search query. During our
performance testing we have observed that for every 2000 request, 1 request
fails.
The exception we observe in solr log are ConcurrentModificationException.
Below is the
Regrettably, visibility is key for us :( Documents must be searchable as
soon as they have been indexed (or as near as we can make it). Our old
search system didn't do relevance sort, it was time-ordered (so it had a
much simpler job) but it did have sub-second latency, and that is what is
Hi
Does solr cloud on a cluster of servers require passwordless ssh to be
configured between the servers?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-solr-cloud-required-passwordless-ssh-tp4074398.html
Sent from the Solr - User mailing list archive at Nabble.com.
To answer the previous Post:
I was not sure what datasource=binaryFile I took it from a PDF sample
thinking that would help.
after setting datasource=null I'm still gett the same errors...
dataConfig
dataSource type=BinFileDataSource user=svcSolr
password=SomePassword /
document
as for the second option:
If you look inside SolrResourceLoader, you will notice that before a
CoreContainer is created, a new class loader is also created
line:111
this.classLoader = createClassLoader(null, parent);
however, this parent object is always null, because it is called from:
No, SolrCloud does not currently use ssh.
- Mark
On Jul 1, 2013, at 12:58 PM, adfel70 adfe...@gmail.com wrote:
Hi
Does solr cloud on a cluster of servers require passwordless ssh to be
configured between the servers?
--
View this message in context:
Thanks Jack , it worked.
Could you please provide some info on how to re-index existing data in
Solr, after changing the schema.xml ?
Thanks,
Tony
On Mon, Jul 1, 2013 at 8:21 PM, Jack Krupansky j...@basetechnology.comwrote:
You can write any function query in the field list of the fl
IIRC Zip files are not supported
On Mon, Jul 1, 2013 at 10:30 PM, ericrs22 ericr...@yahoo.com wrote:
To answer the previous Post:
I was not sure what datasource=binaryFile I took it from a PDF sample
thinking that would help.
after setting datasource=null I'm still gett the same errors...
I'm using the Tika plugin to do so and according to
http://tika.apache.org/0.5/formats.html it does
*ZIP archive (application/zip) Tika uses Java's built-in Zip classes to
parse ZIP files.
Support for ZIP was added in Tika 0.2.*
--
View this message in context:
In schema.xml I know you can label a field as stored=false or
stored=true, but if you say neither, which is it by default?
Thank you
Katie
Haven't tried it recently, but is that even legal? Just be explicit :)
Otis
--
Solr ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Mon, Jul 1, 2013 at 2:16 PM, Katie McCorkell
katiemccork...@gmail.com wrote:
In schema.xml I know you can
If all your fields are stored, you can do it with
http://search-lucene.com/?q=solrentityprocessor
Otherwise, just reindex the same way you indexed in the first place.
*Always* be ready to reindex from scratch.
Otis
--
Solr ElasticSearch Support -- http://sematext.com/
Performance Monitoring --
Hey Ahmet / Solr User Group,
I tried using the built in UpdateCSV and it runs A LOT faster than a
FileDataSource DIH as illustrated below. However, I am a bit confused about the
numDocs/maxDoc values when doing an import this way. Here's my Get command
against a Tab delimted file: (I
is it conceivable that there's too much traffic, causing Solr to stall
re-opening the searcher (thus releasing to the new index)? I'm grasping at
straws, and this is beginning to bug me a lot. The traffic logs wouldn't
seem to support this (apart from periodic health-check pings, the load is
in Solr's admin statistics page, there is a 'current' flag indicating whether
the core index reader is 'current' or not. According to some discussions in
this mailing list a few months back, it wouldn't affect anything. But my
observation is completely different. When the current flag was not
On 7/1/2013 12:56 PM, Mike L. wrote:
Hey Ahmet / Solr User Group,
I tried using the built in UpdateCSV and it runs A LOT faster than a
FileDataSource DIH as illustrated below. However, I am a bit confused about the
numDocs/maxDoc values when doing an import this way. Here's my Get
stored and indexed both default to true.
This is legal:
field name=alpha type=string /
This detail will be in Early Access Release #2 of my book on Friday.
-- Jack Krupansky
-Original Message-
From: Otis Gospodnetic
Sent: Monday, July 01, 2013 2:21 PM
To:
On Mon, Jul 1, 2013 at 3:50 PM, Jack Krupansky j...@basetechnology.com wrote:
stored and indexed both default to true.
This is legal:
field name=alpha type=string /
Actually, for fields I believe the defaults come from the fieldType.
The fieldType defaults to true for both indexed and
On 7/1/2013 1:07 PM, Neal Ensor wrote:
is it conceivable that there's too much traffic, causing Solr to stall
re-opening the searcher (thus releasing to the new index)? I'm grasping at
straws, and this is beginning to bug me a lot. The traffic logs wouldn't
seem to support this (apart from
Correct - the field definitions inherit the attributes of the field type,
and it is the field type that has the actual default values for indexed and
stored (and other attributes.)
-- Jack Krupansky
-Original Message-
From: Yonik Seeley
Sent: Monday, July 01, 2013 3:56 PM
To:
Or, go with a commercial product that has a single-click Solr re-index
capability, such as:
1. DataStax Enterprise - data is stored in Cassandra and reindexed into Solr
from there.
2. LucidWorks Search - data sources are declared so that the package can
automatically re-crawl the data
I have some custom code that uses the top-level FieldCache (e.g.,
FieldCache.DEFAULT.getLongs(reader, foobar, false)). I'd like to redesign
this to use the per-segment FieldCaches so that re-opening a Searcher is
fast(er). In most cases, I've got a docId and I want to get the value for a
Thanks Erick/Jagdish.
Just to give some background on my queries.
1. All my queries are unique. A query can be: ipod and ipod 8gb (but
these are unique). These are about 1.2M in total.
So, I assume setting a high queryResultCache, queryResultWindowSize and
queryResultMaxDocsCached won't help.
Hi all,
I noticed that for Solr 4.2, when an internal call is made between two nodes
Solr uses the list of matching document ids to fetch the document details. At
this time, it prints out all matching document ids as a part of the query. Is
there a way to suppress these log statements from
I would say definitely investigate the performance of the query, but also
since you're using CachedSqlEntityProcessor, you might want to back off on
the transaction isolation to READ_COMMITTED, which I think is the lowest
one that Oracle supports:
On 7/1/2013 3:24 PM, Niran Fajemisin wrote:
I noticed that for Solr 4.2, when an internal call is made between two nodes
Solr uses the list of matching document ids to fetch the document details. At
this time, it prints out all matching document ids as a part of the query. Is
there a way to
not sure if this will help any.
Here's the verbose log
INFO - 2013-07-01 23:17:08.632;
org.apache.solr.handler.dataimport.DataImporter; Loading DIH Configuration:
tika-data-config.xml
INFO - 2013-07-01 23:17:08.648;
org.apache.solr.handler.dataimport.DataImporter; Data Configuration loaded
On Mon, Jul 1, 2013 at 5:56 PM, adfel70 adfe...@gmail.com wrote:
This requires me to override the solr document distribution mechanism.
I fear that with this solution I may loose some of solr cloud's
capabilities.
It's not clear whether you aware of
from my experience deeply nested scopes is for SOLR-3076 almost only.
On Sat, Jun 29, 2013 at 1:08 PM, Sperrink
kevin.sperr...@lexisnexis.co.zawrote:
Good day,
I'm seeking some guidance on how best to represent the following data
within
a solr schema.
I have a list of subjects which are
63 matches
Mail list logo