i did some research in schema DIH config file and i created my own DIH, i'm
getting this error when i run
response
−
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
−
lst name=initArgs
−
lst name=defaults
str name=configtry.xml/str
/lst
/lst
str name=commandfull-import/str
Rich,
i played around for a few minutes with Script-Transformers, but i have not
enough knowledge to get anything done right know :/
My Idea was: looping over the given row, which should be a Java HashMap or
something like that? and do sth like this (pseudo-code):
var row_data = []
for( var key
Cam,
the examples with the provided inline-documentation should help you, no?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
The Backslash \ in that context looks like an Escaping-Character, to avoid
the = to be interpreted as assign-command
Regards
Stefan
Hi,
I am facing performance issues in three types of queries (and their
combination). Some of the queries take more than 2-3 mins. Index size is
around 150GB.
- Wildcard
- Proximity
- Phrases (with common words)
I know CommonGrams and Stop words are a good way to resolve such issues
Caused by: org.xml.sax.SAXParseException: Element type field must be
followed by either attribute specifications, or /.
Sounds like invalid XML in your .. dataimport-config?
On Tue, Jan 25, 2011 at 5:41 AM, Dinesh mdineshkuma...@karunya.edu.inwrote:
http://pastebin.com/tjCs5dHm
this is the
ya after correcting it also it is throwing an exception
-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
--
View this message in context:
On Tue, Jan 25, 2011 at 10:05 AM, Dinesh mdineshkuma...@karunya.edu.in wrote:
http://pastebin.com/CkxrEh6h
this is my sample log
[...]
And, which portions of the log text do you want to preserve?
Does it go into Solr as a single error message, or do you want
to separate out parts of it.
i want to take the month, time, DHCPMESSAGE, from_mac, gateway_ip, net_ADDR
-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Getting-started-with-writing-parser-tp2278092p2327738.html
this thread explains my problem
-
DINESHKUMAR . M
I am neither especially clever nor especially gifted. I am only very, very
curious.
--
View this message in context:
On Tue, Jan 25, 2011 at 11:44 AM, Dinesh mdineshkuma...@karunya.edu.in wrote:
i don't even know whether the regex expression that i'm using for my log is
correct or no..
If it is the same try.xml that you posted earlier, it is very likely not
going to work. You seem to have just cut and pasted
no i actually changed the directory to mine where i stored the log files.. it
is /home/exam/apa..solr/example/exampledocs
i specified it in a solr schema.. i created an DataImportHandler for that in
try.xml.. then in that i changed that file name to sample.txt
that new try.xml is
Hi,
I posted a question in November last year about indexing content from
multiple binary files into a single Solr document and Jayendra responded
with a simple solution to zip them up and send that single file to Solr.
I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't
Hi All,
I need to index the documents presents in my file system at various
locations (e.g. C:\docs , d:\docs ).
Is there any way through which i can specify this in my DIH
Configuration.
Here is my configuration:-
document
entity name=sd
On Tue, 2011-01-25 at 10:20 +0100, Salman Akram wrote:
Cache warming is a good option too but the index get updated every hour so
not sure how much would that help.
What is the time difference between queries with a warmed index and a
cold one? If the warmed index performs satisfactory, then
Hi,
recently we're experiencing OOMEs (GC overhead limit exceeded) in our
searches. Therefore I want to get some clarification on heap and cache
configuration.
This is the situation:
- Solr 1.4.1 running on tomcat 6, Sun JVM 1.6.0_13 64bit
- JVM Heap Params: -Xmx8G -XX:MaxPermSize=256m
Hi Chris,
On 24/01/11 21:18, Chris Hostetter wrote:
: I notice that in the schema, it is only possible to specify a Analyzer class,
: but not a Factory class as for the other elements (Tokenizer, Fitler, etc.).
: This limits the use of this feature, as it is impossible to specify parameters
:
Hi,
as the biggest parts of our jvm heap are used by solr caches I asked myself
if it wouldn't make sense to run solr caches backed by terracotta's
bigmemory (http://www.terracotta.org/bigmemory).
The goal is to reduce the time needed for full / stop-the-world GC cycles,
as with our 8GB heap the
By warmed index you only mean warming the SOLR cache or OS cache? As I said
our index is updated every hour so I am not sure how much SOLR cache would
be helpful but OS cache should still be helpful, right?
I haven't compared the results with a proper script but from manual testing
here are some
Hi,
Are you sure you need CMS incremental mode? It's only adviced when running on
a machine with one or two processors. If you have more you should consider
disabling the incremental flags.
Cheers,
On Monday 24 January 2011 19:32:38 Simon Wistow wrote:
We have two slaves replicating off one
Frankly, this puzzles me. It *looks* like it should be OK. One warning, the
analysis page sometimes is a bit misleading, so beware of that.
But the output of your queries make it look like the query is parsing as you
expect, which leaves the question of whether your index contains what
you think
On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote:
Hi,
recently we're experiencing OOMEs (GC overhead limit exceeded) in our
searches. Therefore I want to get some clarification on heap and cache
configuration.
This is the situation:
- Solr 1.4.1 running on tomcat 6, Sun JVM
Hi Siva,
try using the Solr Stats Component
http://wiki.apache.org/solr/StatsComponent
similar to
select/?q=*:*stats=truestats.field={your-weight-field}stats.facet={your-facet-field}
and get the sum field from the response. You may need to resort the weighted
facet counts to get a descending
Hi Eric,
You are right, there is a copy field to EdgeNgram, I tried the configuration
but it not working as expected.
Configuration I tried:
fieldType name=”query” class=”solr.TextField” positionIncrementGap=”100″
termVectors=”true”
analyzer
I would just use Nutch and specify the -solr param on the command line. That
will add the extracted content your instance of solr.
Adam
Sent from my iPhone
On Jan 25, 2011, at 5:29 AM, pankaj bhatt panbh...@gmail.com wrote:
Hi All,
I need to index the documents presents in my file
On Tue, Jan 25, 2011 at 2:06 PM, Markus Jelsma
markus.jel...@openindex.iowrote:
On Tuesday 25 January 2011 11:54:55 Martin Grotzke wrote:
Hi,
recently we're experiencing OOMEs (GC overhead limit exceeded) in our
searches. Therefore I want to get some clarification on heap and cache
On 25.01.11 11.30, Erlend Garåsen wrote:
Tika version 0.8 is not included in the latest release/trunk from SVN.
Ouch, I wrote not instead of now. Sorry, I replied in a hurry.
And to clarify, by content I mean the main content of a Word file.
Title and other kinds of metadata are
On Tue, Jan 25, 2011 at 3:46 PM, Dinesh mdineshkuma...@karunya.edu.in wrote:
no i actually changed the directory to mine where i stored the log files.. it
is /home/exam/apa..solr/example/exampledocs
i specified it in a solr schema.. i created an DataImportHandler for that in
try.xml.. then
Hi Martin,
are you sure that your GC is well tuned?
A request that needs more than a minute isn't the standard, even when I
consider all the other postings about response-performance...
Regards
--
View this message in context:
Thanks Erlend.
Not used SVN before, but have managed to download and build latest trunk
code.
Now I'm getting an error when trying to access the admin page (via
Jetty) because I specify HTMLStripStandardTokenizerFactory in my
schema.xml, but this appears to be no-longer supplied as part of
I use a lot of dynamic fields, so looking at my schema isn't a good way to
see all the field names that may be indexed across all documents. Is there a
way to query solr for that information? All field names that are indexed, or
stored? Possibly a count by field name? Is there any other metadata
OK, got past the schema.xml problem, but now I'm back to square one.
I can index the contents of binary files (Word, PDF etc...), as well as
text files, but it won't index the content of files inside a zip.
As an example, I have two txt files - doc1.txt and doc2.txt. If I index
either of
You can query all the indexed or stored fields (including dynamic fields)
using the LukeRequestHandler: http://localhost:8983/solr/example/admin/luke
See also: http://wiki.apache.org/solr/LukeRequestHandler
Regards,
*
**Juan G. Grande*
-- Solr Consultant @ http://www.plugtree.com
-- Blog @
Thanks Adam, It seems like Nutch use to solve most of my concerns.
i would be great if you can have share resources for Nutch with us.
/ Pankaj Bhatt.
On Tue, Jan 25, 2011 at 7:21 PM, Estrada Groups
estrada.adam.gro...@gmail.com wrote:
I would just use Nutch and specify the -solr param on the
Hi Gary,
The latest Solr Trunk was able to extract and index the contents of the zip
file using the ExtractingRequestHandler.
The snapshot of Trunk we worked upon had the Tika 0.8 snapshot jars and
worked pretty well.
Tested again with sample url and works fine -
curl
Hi ,
I have written a lucene custom filter.
I could not figure out on how to configure Solr to pick this custom filter
for search.
How to configure Solr to pick my custom filter?
Will the Solr standard search handler pick this custom filter?
Thanks,
Valiveti
--
View this message in context:
So, the index is a list of tokens per column, right?
There's a table per column that lists the analyzed tokens?
And the tokens per column are represented as what, system integers? 32/64 bit
unsigned ints?
Dennis Gearon
Signature Warning
It is always a good idea to learn
Why does it matter? You can't really get at them unless you store them.
I don't know what table per column means, there's nothing in Solr
architecture called a table or a column. Although by column you
probably mean more or less Solr field. There is nothing like a
table in Solr.
Solr is
Let's back up here because now I'm not clear what you actually want.
EdgeNGrams
are a way of matching substrings, which is what's happening here. Of course
searching apple against any of the three examples, just as searching for
apple
without grams would match, that's the expected behavior.
So,
Anyone?
On Tue, Jan 25, 2011 at 12:57 AM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
Just to add one thing, in case it makes a difference.
Max document size on which highlighting needs to be done is few hundred
kb's (in file system). In index its compressed so should be much
Presumably your custom filter is in a jar file. Drop that jar file in
solr_home/lib
and refer it from your schema.xml file by its full name
(e.g. com.yourcompany.filter.yourcustomfilter) just like the other filters
and it should
work fine.
You can also put your jar anywhere you'd like and alter
That's exactly what I wanted, thanks. Any idea what
long name=version1294513299077/long
refers to under the index section? I have 2 cores on one Tomcat instance,
and 1 on a second instance (different server) and all 3 have different
numbers for version, so I don't think it's the version of
The index version. Can be used in replication to determine whether to
replicate or not.
On Tuesday 25 January 2011 20:30:21 kenf_nc wrote:
refers to under the index section? I have 2 cores on one Tomcat instance,
and 1 on a second instance (different server) and all 3 have different
numbers
Hi Eric,
What I want here is, lets say I have 3 documents like
[pineapple vers apple, milk with apple, apple milk shake ]
and If i search for apple, it should return only apple milk shake
because that term alone starts with the letter apple which I typed in. It
should not bring others and if
There are a few tutorials out there.
1. http://wiki.apache.org/nutch/RunningNutchAndSolr (not the most practical)
2. http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ (similar to 1.)
3. Build the latest from branch
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/ and read
this
I take that back...Use am currently using version 1.2 and make sure
that the latest versions of Tika and PDFBox is in the contrib folder.
1.3 is structured a bit differently and it doesn't look like there is
a contrib directory. Maybe one of the Nutch contributors can comment
on this?
Adam
On
This is to announce the Berlin Buzzwords 2011. The second edition of the
successful conference on scalable and open search, data processing and data
storage in Germany, taking place in Berlin.
Call for Presentations Berlin Buzzwords
Hi Eric,
Thanks for the reply.
I Did see some entries in the solrconfig.xml for adding custom
reposneHandlers, queryParsers and queryResponseWriters.
Bit could not find the one for adding the custom filter.
Could you point to the exact location or syntax to be used.
Thanks,
Valiveti
--
I haven't figured out any way to achieve that AT ALL without making a
seperate Solr index just to serve autosuggest queries. At least when you
want to auto-suggest on a multi-value field. Someone posted a crazy
tricky way to do it with a single-valued field a while ago. If you
can/are willing
Then you don't need NGrams at all. A wildcard will suffice or you can use the
TermsComponent.
If these strings are indexed as single tokens (KeywordTokenizer with
LowercaseFilter) you can simply do field:app* to retrieve the apple milk
shake. You can also use the string field type but then you
Oh, i should perhaps mention that EdgeNGrams will yield results a lot quicker
than using wildcards at the cost of a larger index. You should, of course, use
EdgeNGrams if you worry about performance and have a huge index and a number
of queries per second.
Then you don't need NGrams at all. A
The index contains around 1.5 million documents. As this is used for
autosuggest feature, performance is an important factor.
So it looks like, using edgeNgram it is difficult to achieve the the
following
Result should return only those terms where search letter is matching with
the first
Hi
I am searching for a way to specify optional terms in a query ( that dont need
to match (But if they match should influence the scoring) )
Using the dismax parser a query like this:
str name=mm2/str
str name=debugQueryon/str
str name=q+lorem ipsum dolor amet/str
str name=qfcontent/str
str
Ah, sorry, I got confused about your requirements, if you just want to
match at the beginning of the field, it may be more possible. Using
edgegrams or wildcard. If you have a single-valued field. Do you have a
single-valued or a multi-valued field? That is, does each document have
just one
With the 'lucene' query parser?
include q.op=OR and then put a + (mandatory) in front of every term
in the 'q' that is NOT optional, the rest will be optional. I think
that will do what want.
Jonathan
On 1/25/2011 5:07 PM, Daniel Pötzinger wrote:
Hi
I am searching for a way to specify
Right now our configuration says multivalues=true. But that need not be
true in our case. Will make it false and try and update this thread with
more details..
--
View this message in context:
http://lucene.472066.n3.nabble.com/EdgeNgram-Auto-suggest-doubles-ignore-tp2321919p2334627.html
Sent
Thank you Markus. I have added few more fields to schema.xml.
Now looks like the products are getting indexed. But no search results.
In Magento if I configure to use SOlr as the search engine. Search is not
returning any results. If I change the search engine to use Magento's
inbuilt MYSQL ,
Hello list,
Apologies if this was already asked, I haven't found the answer in the archive.
I've been out of this list for quite some time now, hence.
I am looking at a good way to package a project based on maven2 that would
create me a solr-based webapp.
I would expect such projects as the
I am saying there is a list of tokens that have been parsed (a table of them)
for each column? Or one for the whole index?
Dennis Gearon
Signature Warning
It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes,
First, let's be sure we're talking about the same thing. My response was for
adding
a filter to your analysis chain for a field in Schema.xml. Are you talking
about a different
sort of filter?
Best
Erick
On Tue, Jan 25, 2011 at 4:09 PM, Valiveti narasimha.valiv...@gmail.comwrote:
Hi Eric,
This should shed some light on the matter
http://lucene.apache.org/java/2_9_0/fileformats.html
I am saying there is a list of tokens that have been parsed (a table of
them) for each column? Or one for the whole index?
Dennis Gearon
Signature Warning
It is always a
Dear Stefan,
thank you for your help!
Well, I wrote a small script, even if not json, but works:
script![CDATA[
function my_serialize(row)
{
var st = ;
st = row.get('stt_id') + || +
row.get('stt_name') + || +
row.get('stt_date_from') + ||
There aren't any tables involved. There's basically one list (per field) of
unique tokens for the entire index, and also, a list for each token of which
documents contain that token. Which is efficiently encoded, but I don't know
the details of that encoding, maybe someone who does can tell
OK, try this.
Use some analysis chain for your field like:
analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
/analyzer
This can be a multiValued field, BTW.
now use the TermsComponent to fetch your data. See:
There's almost no information to go on here. Please review:
http://wiki.apache.org/solr/UsingMailingLists
Best
Erick
On Tue, Jan 25, 2011 at 6:13 PM, Sandhya Padala geend...@gmail.com wrote:
Thank you Markus. I have added few more fields to schema.xml.
Now looks like the products are getting
Hello
I am searching for a way to specify optional terms in a query ( that dont need
to match (But if they match should influence the scoring) )
Using the dismax parser a query like this:
str name=mm2/str
str name=debugQueryon/str
str name=q+lorem ipsum dolor amet/str
str name=qfcontent/str
I am not sure if i really understand what that mean by clean=false.
In my understanding, for full-import with default clean=true, it will blow
off all document of the existing index. Then full import data from a table
into a index. Is that right?
Then for clean=false, my understanding is that
66 matches
Mail list logo