.
How can i use payloads for boosting?
What are the changes required in schema.xml?
Please provide me some pointers to move ahead
Thanks in advance
--
Lance Norskog
goks...@gmail.com
Yes, you should use a recent Java 7. Java 6 is end-of-life and no longer
supported by Oracle. Also, read up on the various garbage collectors. It
is a complex topic and there are many guides online.
In particular there is a problem in some Java 6 releases that causes a
massive memory leak in
, it is working properly, results are
stable and correct.
Please help me to make solr results consistent.
Thanks in Advance.
--
Lance Norskog
goks...@gmail.com
On 10/13/2013 10:02 AM, Shawn Heisey wrote:
On 10/13/2013 10:16 AM, Josh Lincoln wrote:
I have a large solr response in xml format and would like to import it into
a new solr collection. I'm able to use DIH with solrEntityProcessor, but
only if I first truncate the file to a small subset of the
the solr result format while using the xpathentityprocessor
(i.e. a useSolrResultSchema option)
Any other ideas?
On Mon, Oct 14, 2013 at 6:24 PM, Lance Norskog goks...@gmail.com wrote:
On 10/13/2013 10:02 AM, Shawn Heisey wrote:
On 10/13/2013 10:16 AM, Josh Lincoln wrote:
I have
Yes, Solr/Lucene works fine with other indexes this large. There are
many indexes with hundreds of gigabytes and hundreds of millions of
documents. My experience years ago was that at this scale, searching
worked great, sorting facets less so, and the real problem was IT: a
200G blob of data
Solr does not by default generate unique IDs. It uses what you give as
your unique field, usually called 'id'.
What software do you use to index data from your RSS feeds? Maybe that
is creating a new 'id' field?
There is no partial update, Solr (Lucene) always rewrites the complete
You need to:
1) crawl the SVN database
2) index the files
3) make a UI that fetches the original file when you click on a search
results.
Solr only has #2. If you run a subversion web browser app, you can
download the developer-only version of the LucidWorks product and crawl
the SVN web
Block-quoting and plagiarism are two different questions.
Block-quoting is simple: break the text apart into sentences or even
paragraphs and make them separate documents. Make facets of the
post-analysis text. Now just pull counts of facets and block quotes will
be clear.
Mahout has a
Cool!
On 08/05/2013 03:34 AM, Charlie Hull wrote:
On 03/08/2013 00:50, Mark wrote:
We have a set number of known terms we want to match against.
In Index:
term one
term two
term three
I know how to match all terms of a user query against the index but
we would like to know how/if we can
Are you feeding Graphite from Solr? If so, how?
On 07/19/2013 01:02 AM, Neil Prosser wrote:
That was overnight so I was unable to track exactly what happened (I'm
going off our Graphite graphs here).
Solr/Lucene does not automatically add when asked, the way DBMS systems
do. Instead, all data for a field is added at the same time. To get the
new field, you have to reload all of your data.
This is also true for deleting fields. If you remove a field, that data
does not go away until you
I don't know about jvm crashes, but it is known that the Java 6 jvm had
various problems supporting Solr, including the 20-30 series. A lot of
people use the final jvm release (I think 6_30).
On 07/16/2013 12:25 PM, neoman wrote:
Hello Everyone,
We are using solrcloud with Tomcat in our
Norms stay in the index even if you delete all of the data. If you just
changed the schema, emptied the index, and tested again, you've still
got norms in there.
You can examine the index with Luke to verify this.
On 07/09/2013 08:57 PM, William Bell wrote:
I have a field that has
Also, total index file size. At 200-300gb managing an index becomes a pain.
Lance
On 07/08/2013 07:28 AM, Jack Krupansky wrote:
Other that the per-node/per-collection limit of 2 billion documents
per Lucene index, most of the limits of Solr are performance-based
limits - Solr can handle it,
This usually means the end server timed out.
On 06/30/2013 06:31 AM, Shahar Davidson wrote:
Hi all,
We're getting the below exception sporadically when using distributed search.
(using Solr 4.2.1)
Note that 'core_3' is one of the cores mentioned in the 'shards' parameter.
Any ideas anyone?
The MappingCharFilter allows you to map both characters to one
characters. If you do this during indexing and querying, searching with
one should find the other. This is sort of like synonyms, but on a
character-by-character basis.
Lance
On 06/18/2013 11:08 PM, Yash Sharma wrote:
Hi,
we have
I do not know what causes the error. This setup will not work. You need
one or three zookeepers. SolrCloud demands that a majority of the ZK
servers agree. If you have two ZKs this will not work.
On 06/29/2013 05:47 AM, Sagar Chaturvedi wrote:
Hi,
I setup 2 solr instances on 2 different
Solr HTTP caching also support e-tags. These are unique keys for the
output of a query. If you send a query twice, and the index has not
changed, the return will be the same. The e-tag is generated from the
query string and the index generation number.
If Varnish supports e-tags, you can keep
Accumulo is a BigTable/Cassandra style distributed database. It is now
an Apache Incubator project. In the README we find this gem:
Synchronize your accumulo conf directory across the cluster. As a
precaution against mis-configured systems, servers using different
configuration files will not
No, they just learned a few features and then stopped because it was
good enough, and they had a thousand other things to code.
As to REST- yes, it is worth having a coherent API. Solr is behind the
curve here. Look at the HATEOS paradigm. It's ornate (and a really goofy
name) but it provides
One small thing: German u-umlaut is often flattened as 'ue' instead of
'u'. And the same with o-umlaut, it can be 'oe' or 'o'. I don't know if
Lucene has a good solution for this problem.
On 06/16/2013 06:44 AM, adityab wrote:
Thanks for the explanation Steve. I now see it clearly. In my case
In 4.x and trunk is a close() method on Tokenizers and Filters. In
currently released up to 4.3, there is instead a reset(stream) method
which is how it resets a TokenizerFilter for a following document in
the same upload.
In both cases I had to track the first time the tokens are consumed,
,
Patrick
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems
Patrick-
I found the problem with multiple documents. The problem was that the
API for the life cycle of a Tokenizer
patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.
Regards,
Patrick
-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems
Patrick-
I found
Patrick-
I found the problem with multiple documents. The problem was that the
API for the life cycle of a Tokenizer changed, and I only noticed part
of the change. You can now upload multiple documents in one post, and
the OpenNLPTokenizer will process each document.
You're right, the
Let's assume that the Solr record includes the database record's
timestamp field.You can make a more complex DIH stack that does a Solr
query with the SolrEntityProcessor. You can do a query that gets the
most recent timestamp in the index, and then use that in the DB update
command.
On
Distributed search does the actual search twice: once to get the scores
and again to fetch the documents with the top N scores. This algorithm
does not play well with deep searches.
On 06/02/2013 07:32 PM, Niran Fajemisin wrote:
Thanks Daniel.
That's exactly what I thought as well. I did try
I will look at these problems. Thanks for trying it out!
Lance Norskog
On 05/28/2013 10:08 PM, Patrick Mi wrote:
Hi there,
Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems
Followed the wiki page instruction and set up a field
If the indexed data includes positions, it should be possible to
implement ^ and $ as the first and last positions.
On 05/22/2013 04:08 AM, Oussama Jilal wrote:
There is no ^ or $ in the solr regex since the regular expression will
match tokens (not the complete indexed text). So the results
This is great; data like this is rare. Can you tell us any hardware or
throughput numbers?
On 05/17/2013 12:29 PM, Rishi Easwaran wrote:
Hi All,
Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd
share some good news.
I work for AOL mail team and we use SOLR for our
If this is for the US, remove the age range feature before you get sued.
On 05/09/2013 08:41 PM, Kamal Palei wrote:
Dear SOLR experts
I might be asking a very silly question. As I am new to SOLR kindly guide
me.
I have a job site. Using SOLR to search resumes. When a HR user enters some
Great! Thank you very much Shawn.
On 05/04/2013 10:55 AM, Shawn Heisey wrote:
On 5/4/2013 11:45 AM, Shawn Heisey wrote:
Advance warning: this is a long reply.
I have condensed some relevant performance problem information into the
following wiki page:
Run checksums on all files in both master and slave, and verify that
they are the same.
TCP/IP has a checksum algorithm that was state-of-the-art in 1969.
On 04/18/2013 02:10 AM, Victor Ruiz wrote:
Also, I forgot to say... the same error started to happen again.. the index
is again corrupted
Outer distance AND NOT inner distance?
On 04/12/2013 09:02 AM, kfdroid wrote:
We currently do a radius search from a given Lat/Long point and it works
great. I have a new requirement to do a search on a larger radius from the
same point, but not include the smaller radius. Kind of a donut
Seconded. Single-stepping really is the best way to follow the logic
chains and see how the data mutates.
On 04/05/2013 06:36 AM, Erick Erickson wrote:
Then there's my lazy method. Fire up the IDE and find a test case that
looks close to something you want to understand further. Step through
Wow! That's great. And it's a lot of work, especially getting it all
keyboard-complete. Thank you.
On 03/14/2013 01:29 AM, Chantal Ackermann wrote:
Hi all,
this is not a question. I just wanted to announce that I've written a blog post
on how to set up Maven for packaging and automatic
Thank you (and Hoss)! I have found this concept elusive, and you two
have nailed it. I will be able to understand it for the 5 minutes I will
need to code with it.
Lance
On 03/09/2013 10:57 AM, David Smiley (@MITRE.org) wrote:
Just finished:
Yes, the SolrEntityProcessor can be used for this.
If you stored the original document bodies in the Solr index!
You can also download the documents in Json or CSV format and re-upload
those to old Solr. I don't know if CSV will work for your docs. If CSV
works, you can directly upload what
Do you use replication instead, or do you just have one instance?
On 02/25/2013 07:55 PM, Otis Gospodnetic wrote:
Hi,
Quick poll to see what % of Solr users use SolrCloud vs. Master-slave setup:
http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/
I have to say I'm surprised with the
Lucene and Solr have an aggressive upgrade schedule.From 3 to 4 got a
major rewiring,
and parts are orders of magnitude faster and smaller.
If you code using Lucene, you will never upgrade to newer versions.
(I supported SolrLucene customers for 3 years, and nobody ever did.)
Cheers,
Lance
I
A side problem here is text analyzers: the analyzers have changed how
they split apart text for searching, and are matched pairs. That is, the
analyzer queries are created matching what the analyzer did when
indexing. If you do this binary upgrade sequence, the indexed data will
not match what
I don't have the source handy. I believe that SolrCloud hard-codes 'id'
as the field name for defining shards.
On 02/04/2013 10:19 AM, Shawn Heisey wrote:
On 2/4/2013 10:58 AM, Lance Norskog wrote:
A side problem here is text analyzers: the analyzers have changed how
they split apart text
It is possible to do this with IP Multicast. The query goes out on the
multicast and all query servers read it. The servers wait for a random
amount of time, then transmit the answer. Here's the trick: it's
multicast. All of the query servers listen to each other's responses,
and drop out when
Thanks, Kai!
About removing non-nouns: the OpenNLP patch includes two simple
TokenFilters for manipulating terms with payloads. The
FilterPayloadFilter lets you keep or remove terms with given payloads.
In the demo schema.xml, there is an example type that keeps only
nounsverbs.
There is a
For this second report, it's easy: switching from a single query server
to a sharded query is going to be slower. Virtual machines add jitter to
the performance and response time of the front-end vs the query shards.
Distributed search does 2 round-trips for each sharded query. Add these
all
This example may be out of date, if the RSS feeds from Slashdot have
changed. If you know XML and XPaths, try this:
Find an rss feed from somewhere that works. Compare the xpaths in it
v.s. the xpaths in the DIH script.
On 01/13/2013 07:38 PM, bibhor wrote:
Hi
I am trying to use the RSS
Will a field have different names in different languages? There is no
facility for 'aliases' for field name. Erick is right, this sounds like
you need query and update components to implement this. Also, you might
try using URL-encoding for the field names. This would save my sanity.
On
Try all of the links under the collection name in the lower left-hand
columns. There several administration monitoring tools you may find useful.
On 01/14/2013 11:45 AM, hassancrowdc wrote:
ok stats are changing, so the data is indexed. But how can i do query with
this data, or ow can i search
At this scale, your indexing job is prone to break in various ways.
If you want this to be reliable, it should be able to restart in the
middle of an upload, rather than starting over.
On 01/08/2013 10:19 PM, vijeshnair wrote:
Yes Shawn, the batchSize is -1 only and I also have the
Also, searching can be much faster if you put all of the shards on one
machine, and the search distributor. That way, you search with multiple
simultaneous threads inside one machine. I've seen this make searches
several times faster.
On 01/03/2013 06:36 AM, Jack Krupansky wrote:
Ah... the
Please start new mail threads for new questions. This makes it much
easier to research old mail threads. Old mail is often the only
documentation for some problems.
On 01/02/2013 10:04 AM, Benjamin, Roy wrote:
Will the existing 3.6 indexes work with 4.0 binary ?
Will 3.6 solrJ clients work
What does group.query do? How is it different from q= and fq= ?
Thanks.
Indexes will not work. I have not heard of an index upgrader. If you run
your 3.6 and new 4.0 Solr at the same time, you can upload all the data
with a DataImportHandler script using the SolrEntityProcessor.
How large are your indexes? 4.1 indexes will not match 4.0, so you will
have to
3 problems:
a- he wanted to read it locally.
b- crawling the open web is imperfect.
c- /browse needs to get at the files with the same URL as the uploader.
a and b- Try downloading the whole thing with 'wget'. It has a 'make
links point to the downloaded files' option. Wget is great.
I have
Maybe you could write a Javascript snippet that downloads and runs your
external file?
On 12/26/2012 09:12 AM, Dyer, James wrote:
I'm not very familiar with using scipting langauges with Java, but having seen the
DIH code for this, my guess is that all script code needs to be in the script
/
Cool!
On 12/25/2012 08:03 AM, Robert Muir wrote:
25 December 2012, Apache Solr™ 3.6.2 available
The Lucene PMC and Santa Claus are pleased to announce the release of
Apache Solr 3.6.2.
Solr is the popular, blazing fast open source enterprise search
platform from the Apache Lucene project. Its
A Solr facet query does a boolean query, caches the Lucene facet data
structure, and uses it as a Lucene filter. After that until you do a
full commit, using the same fq=string (you must match the string
exactly) fetches the cached data structure and uses it again as a Lucene
filter.
Have
?
On Sunday, December 23, 2012, Lance Norskog wrote:
Please start a new thread.
Thanks!
On 12/22/2012 11:03 AM, J Mohamed Zahoor wrote:
Hi
I have a word completion requirement where i need to pick result from two
indexed fields.
The trick is i need to pick top 5 results from each field
Please start a new thread.
Thanks!
On 12/22/2012 11:03 AM, J Mohamed Zahoor wrote:
Hi
I have a word completion requirement where i need to pick result from two
indexed fields.
The trick is i need to pick top 5 results from each field and display as
suggestions.
If i set fq as field1:XXX
The only sure way to get the last searchable document is to use a
timestamp or sequence number in the document. I do not think that using
a timestamp with default=NOW will give a unique timestamp, so you need
your own sequence number.
On 12/19/2012 10:17 PM, Joe wrote:
I'm using SOLR 4 for
To be clear: 1) is fine. Lucene index updates are carefully sequenced so
that the index is never in a bogus state. All data files are written and
flushed to disk, then the segments.* files are written that match the
data files. You can capture the files with a set of hard links to create
a
Do you use rounding in your dates? You can index a date rounded to the
nearest minute, N minutes, hour or day. This way a range query has to
look at such a small number of terms that you may not need to tune the
precision step. Hunt for NOW/DAY or 5DAYS in the queries.
to build, save, and query the bitmap
whereas working on top of existing functionality seems to me a lot more
maintainable on the user's part.
~ David
From: Lance Norskog-2 [via Lucene] [ml-node+s472066n4025579...@n3.nabble.com]
Sent: Sunday, December 09, 2012 6:35 PM
.nabble.com/Modeling-openinghours-using-multipoints-tp4025336p4025454.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com
Maybe these are text encoding markers?
- Original Message -
| From: Eva Lacy e...@lacy.ie
| To: solr-user@lucene.apache.org
| Sent: Thursday, November 29, 2012 3:53:07 AM
| Subject: Re: Downloading files from the solr replication Handler
|
| I tried downloading them with my browser and
sagarzond- you are trying to embed a recommendation system into search.
Recommendations are inherently a matrix problem, where Solr and other search
engines are one-dimensional databases. What you have is a sparse user-product
matrix. This book has a good explanation of recommender systems:
You don't need the transformers.
I think the paths should be what is in the XML file.
forEach=/add
And the paths need to use the syntax for name=fname and name=number. I
think this is it, but you should make sure.
xpath=/add/doc/field[@name='fname']
xpath=/add/doc/field[@name='number']
Look
- http://sematext.com/spm/index.html
| Search Analytics - http://sematext.com/search-analytics/index.html
|
|
|
|
| On Sat, Nov 24, 2012 at 9:30 PM, Lance Norskog goks...@gmail.com
| wrote:
|
| sagarzond- you are trying to embed a recommendation system into
| search.
| Recommendations
| dataSource=null
I think this should not be here. The datasource should default to the
dataSource listing. And 'rootEntity=true' should be in the
XPathEntityProcessor block, because you are adding each file as one document.
- Original Message -
| From: Spadez
I think this means the pattern did not match any files:
str name=Total Rows Fetched0/str
The wiki example includes a '^' at the beginning of the filename pattern. This
matches a complete line.
http://wiki.apache.org/solr/DataImportHandler#Transformers_Example
More:
Add rootEntity=true. It
LucidFind collects several sources of information in one searchable archive:
http://find.searchhub.org/?q=sort=#%2Fp%3Asolr
- Original Message -
| From: Dmitry Kan dmitry@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Sunday, November 11, 2012 2:24:21 AM
| Subject: Re: More
You can debug this with the 'Analysis' page in the Solr UI. You pick
'text_general' and then give words with umlauts in the text box for indexing
and queries.
Lance
- Original Message -
| From: Daniel Brügge daniel.brue...@googlemail.com
| To: solr-user@lucene.apache.org
| Sent:
LucidFind is a searchable archive of Solr documentation and email lists:
http://find.searchhub.org/?q=solrcloud
- Original Message -
| From: Jack Krupansky j...@basetechnology.com
| To: solr-user@lucene.apache.org
| Sent: Monday, November 5, 2012 4:44:46 AM
| Subject: Re: Where to get
The question you meant to ask is: Does MoreLikeThis support Distributed
Search? and the answer apparently is no. This is the issue to get it working:
https://issues.apache.org/jira/browse/SOLR-788
(Distributed Search is independent of SolrCloud.) If you want to make unit
tests, that would
an post that and/or include it in your sample XML
| file...
|
| Best
| Erick
|
|
| On Fri, Nov 2, 2012 at 10:02 AM, Dotan Cohen dotanco...@gmail.com
| wrote:
|
| On Thu, Nov 1, 2012 at 9:28 PM, Lance Norskog goks...@gmail.com
| wrote:
| Have you uploaded data with that field populated? Solr
Have you uploaded data with that field populated? Solr is not like a relational
database. It does not automatically populate a new field when you add it to the
schema. If you sort on a field, a document with no data in that field comes
first or last (I don't know which).
- Original
1) Do you use compound files (CFS)? This adds a lot of overhead to merging.
2) Does ES use the same merge policy code as Solr?
In solrconfig.xml, here are the lines that control segment merging. You can
probably set mergeFactor to 20 and cut the amount of disk I/O.
!-- Expert: Merge Policy
understand the real question here. What is the
| metadata.
|
| I mean, q=xfl=* gives you all the (stored) fields for documents
| matching
| the query.
|
| What else is there?
|
| -- Jack Krupansky
|
| -Original Message-
| From: Lance Norskog
| Sent: Friday, October 26, 2012 9:42 PM
.
|
| Erik
|
|
| On Oct 27, 2012, at 04:09 , Lance Norskog wrote:
|
| Nope! Each document comes back with its own list of stored fields.
| If you want to find all fields in an index, you have to fetch
| every last document and OR in the fields in that document. There
| is no Solr call
Aha! Andrzej has not built a 4.0 release version. You need to check out the
source and compile your own.
http://code.google.com/p/luke/downloads/list
- Original Message -
| From: Carrie Coy c...@ssww.com
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 7:33:45 AM
|
/browse/SOLR-2141) which goes back to
| October 2010 and is flagged as Resolved: Cannot Reproduce.
|
|
| 2012/10/20 Lance Norskog goks...@gmail.com:
| If it worked before and does not work now, I don't think you are
| doing anything wrong :)
|
| Do you have a different version of your JDBC driver
Ah, there's the problem- what is a fast way to fetch all fields in a
collection, including dynamic fields?
- Original Message -
| From: Otis Gospodnetic otis.gospodne...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Friday, October 26, 2012 3:05:04 PM
| Subject: Re: Get metadata
A side point: in fact, the connection between MBA and grade is not lost. The
values in a multi-valued field are stored in order. You can have separate
multi-valued fields with matching entries, and the values will be fetched in
order and you can match them by counting. This is not database-ish,
Do other fields get added?
Do these fields have type problems? I.e. is 'attr1' a number and you are adding
a string?
There is a logging EP that I think shows the data found- I don't know how to
use it.
Is it possible to post the whole DIH script?
- Original Message -
| From: Billy
If it worked before and does not work now, I don't think you are doing anything
wrong :)
Do you have a different version of your JDBC driver?
Can you make a unit test with a minimal DIH script and schema?
Or, scan through all of the JIRA issues against the DIH from your old Solr
capture date.
There is no backed by disk RamDirectory feature. The MMapDirectory uses the
operating system to do almost exactly the same thing, in a much better way.
That is why it is the default.
- Original Message -
| From: deniz denizdurmu...@gmail.com
| To: solr-user@lucene.apache.org
| Sent:
I do not know how to load an index from disk into a RAMDirectory in Solr.
- Original Message -
| From: deniz denizdurmu...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 17, 2012 12:15:52 AM
| Subject: Re: Flushing RAM to disk
|
| I heard about MMapDirectory -
CheckIndex prints these stats.
java -cp lucene-core-WHATEVER.jar org.apache.lucene.index.CheckIndex
- Original Message -
| From: Shawn Heisey s...@elyograg.org
| To: solr-user@lucene.apache.org
| Sent: Monday, October 15, 2012 9:46:33 PM
| Subject: Re: How many documents in each Lucene
http://find.searchhub.org/?q=autosuggest+OR+autocomplete
- Original Message -
| From: Rahul Paul rahul.p...@iiitb.org
| To: solr-user@lucene.apache.org
| Sent: Monday, October 15, 2012 9:01:14 PM
| Subject: Solr Autocomplete
|
| Hi,
| I am using mysql for solr indexing data in solr. I
.472066.n3.nabble.com/Solr-db-data-config-xml-general-asking-to-entity-tp4013533.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com
this message in context:
http://lucene.472066.n3.nabble.com/How-to-import-a-part-of-index-from-main-Solr-server-based-on-a-query-to-another-Solr-server-and-then-tp4013479p4013580.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com
-api-to-use-to-manage-solr-tp4013491.html
Sent from the Solr - User mailing list archive at Nabble.com.
--
Lance Norskog
goks...@gmail.com
] ::
[ivy:resolve]
[ivy:resolve]
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Can anybody point me to the source of this error or a workaround?
Thanks,
Tricia
--
Lance Norskog
goks...@gmail.com
I want an update processor that runs Translation Party.
http://translationparty.com/
http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/
- Original Message -
| From: SUJIT PAL sujit@comcast.net
| To:
Hapax legomena (terms with DF of 1) are very often typos. You can automatically
build a stopword file from these. If you want to be picky, you can use only
words with a very small distance from words with much larger DF.
- Original Message -
| From: Robert Muir rcm...@gmail.com
| To:
Study index merging. This is awesome.
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
Jame- opening lots of segments is not a problem. A major performance problem
you will find is 'Large Pages'. This is an operating-system strategy for
managing servers with 10s of
Thanks, everyone. This is the problem: $sentence is a NamedList node, with a
name and a value (any Java object). I want its value subnode:
#foreach($sentence in $outer)
$sentence = $sentence.value
|
| Here is the XML from a search result:
| lst name=outer
| lst name=sentence
|int
If it is a simple text file, does that text file start with the UTF-16 BOM
marker?
http://unicode.org/faq/utf_bom.html
Also, do UTF-8 files work? If not, then your setup has a basic encoding problem.
And, when you post such a text file (for example, with curl), use the UTF-16
charset mime-type:
You can find Solr information with this:
http://find.searchhub.org/?q=zookeeper+cluster
http://find.searchhub.org/link?url=http://wiki.apache.org/solr/SolrCloud
- Original Message -
| From: varun srivastava varunmail...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Saturday,
1 - 100 of 1360 matches
Mail list logo