The best way to delete several documents?

2013-01-27 Thread Bruno Mannina

Dear Solr users,

Every Friday I need to delete some documents on my solr db (around 
100~200 docs).


Could you help me to choose the best way to delete these documents.
- I have the unique ID of each documents

Another question:
How can I disable to possibility to do:

http://localhost:8983/solr/update?stream.body=deletequeryid:298253/query/deletecommit=true
by using a webbrowser.

I would like to do operations on my DB only if use a command line like
java -jar -DUrl=

http://localhost:8983/solr/update?stream.body=deletequeryid:298253/query/deletecommit=true
 post.jar

is it possible?

Thanks a lot,
Bruno



Re: [ANNOUNCE] Web Crawler

2013-01-27 Thread SivaKarthik
Hii,
 I'm trying to configure crawl-anywhere 3.0.3 version in my local system..
 i'm following the steps from the page
http://www.crawl-anywhere.com/installation-v300/
 but, crawlerws is failing and throwing the below error message in the
brower
  http://localhost:8080/crawlerws/

error
   errno1/errno
   errmsgMissing action/errmsg
/error

Not sure where im doing wrong.. could please help me out to resolve the
problem.. thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036493.html
Sent from the Solr - User mailing list archive at Nabble.com.


indexing Text file in solr

2013-01-27 Thread hadyelsahar
i have a large Arabic Text File that contains Tweets each line contains one
tweet , that i want to index in solr such that each line of this document
should be indexed in a separate solr document

what i tried so far :

i know how to SQL databse records in solr
i know how to change solr schema to fit the data and working with Data
import handler
i know how the queries used to index data in solr
what i want is :

know how to index text file in solr in order that each line is considered a
solr document



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-Text-file-in-solr-tp4036496.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: The best way to delete several documents?

2013-01-27 Thread Marcin Rzewucki
Hi,

The best is if you could find a query for all docs you want to remove. If
this is not simple you can use the following syntax: id: (1 2 3 4 5) to
remove group of docs by ID (and if your default query operator is OR).

Regards.

On 27 January 2013 11:47, Bruno Mannina bmann...@free.fr wrote:

 Dear Solr users,

 Every Friday I need to delete some documents on my solr db (around 100~200
 docs).

 Could you help me to choose the best way to delete these documents.
 - I have the unique ID of each documents

 Another question:
 How can I disable to possibility to do:

 http://localhost:8983/solr/**update?stream.body=http://localhost:8983/solr/update?stream.body=
 delete**queryid:298253/query/**deletecommit=true
 by using a webbrowser.

 I would like to do operations on my DB only if use a command line like
 java -jar -DUrl=

 http://localhost:8983/solr/**update?stream.body=http://localhost:8983/solr/update?stream.body=
 delete**queryid:298253/query/**deletecommit=true post.jar

 is it possible?

 Thanks a lot,
 Bruno




Re: The best way to delete several documents?

2013-01-27 Thread Bruno Mannina

Hi,

Even If I have one or two thousands of Id ?

Thanks

Le 27/01/2013 13:15, Marcin Rzewucki a écrit :

Hi,

The best is if you could find a query for all docs you want to remove. If
this is not simple you can use the following syntax: id: (1 2 3 4 5) to
remove group of docs by ID (and if your default query operator is OR).

Regards.

On 27 January 2013 11:47, Bruno Mannina bmann...@free.fr wrote:


Dear Solr users,

Every Friday I need to delete some documents on my solr db (around 100~200
docs).

Could you help me to choose the best way to delete these documents.
- I have the unique ID of each documents

Another question:
How can I disable to possibility to do:

http://localhost:8983/solr/**update?stream.body=http://localhost:8983/solr/update?stream.body=
delete**queryid:298253/query/**deletecommit=true
by using a webbrowser.

I would like to do operations on my DB only if use a command line like
java -jar -DUrl=

http://localhost:8983/solr/**update?stream.body=http://localhost:8983/solr/update?stream.body=
delete**queryid:298253/query/**deletecommit=true post.jar

is it possible?

Thanks a lot,
Bruno






Re: The best way to delete several documents?

2013-01-27 Thread Marcin Rzewucki
You can write a script and remove say 50 docs in 1 call. It's always better
than removing 1 by 1.
Regards.


On 27 January 2013 13:17, Bruno Mannina bmann...@free.fr wrote:

 Hi,

 Even If I have one or two thousands of Id ?

 Thanks

 Le 27/01/2013 13:15, Marcin Rzewucki a écrit :

 Hi,

 The best is if you could find a query for all docs you want to remove. If
 this is not simple you can use the following syntax: id: (1 2 3 4 5) to
 remove group of docs by ID (and if your default query operator is OR).

 Regards.

 On 27 January 2013 11:47, Bruno Mannina bmann...@free.fr wrote:

  Dear Solr users,

 Every Friday I need to delete some documents on my solr db (around
 100~200
 docs).

 Could you help me to choose the best way to delete these documents.
 - I have the unique ID of each documents

 Another question:
 How can I disable to possibility to do:

 http://localhost:8983/solr/update?stream.body=http://localhost:8983/solr/**update?stream.body=
 http://**localhost:8983/solr/update?**stream.body=http://localhost:8983/solr/update?stream.body=
 
 delete**queryid:298253/**query/**deletecommit=true

 by using a webbrowser.

 I would like to do operations on my DB only if use a command line like
 java -jar -DUrl=

 http://localhost:8983/solr/update?stream.body=http://localhost:8983/solr/**update?stream.body=
 http://**localhost:8983/solr/update?**stream.body=http://localhost:8983/solr/update?stream.body=
 
 delete**queryid:298253/**query/**deletecommit=true post.jar


 is it possible?

 Thanks a lot,
 Bruno






Re: The best way to delete several documents?

2013-01-27 Thread Bruno Mannina

yep ok thks !

Le 27/01/2013 13:27, Marcin Rzewucki a écrit :

You can write a script and remove say 50 docs in 1 call. It's always better
than removing 1 by 1.
Regards.


On 27 January 2013 13:17, Bruno Mannina bmann...@free.fr wrote:


Hi,

Even If I have one or two thousands of Id ?

Thanks

Le 27/01/2013 13:15, Marcin Rzewucki a écrit :


Hi,

The best is if you could find a query for all docs you want to remove. If
this is not simple you can use the following syntax: id: (1 2 3 4 5) to
remove group of docs by ID (and if your default query operator is OR).

Regards.

On 27 January 2013 11:47, Bruno Mannina bmann...@free.fr wrote:

  Dear Solr users,

Every Friday I need to delete some documents on my solr db (around
100~200
docs).

Could you help me to choose the best way to delete these documents.
- I have the unique ID of each documents

Another question:
How can I disable to possibility to do:

http://localhost:8983/solr/update?stream.body=http://localhost:8983/solr/**update?stream.body=
http://**localhost:8983/solr/update?**stream.body=http://localhost:8983/solr/update?stream.body=
delete**queryid:298253/**query/**deletecommit=true

by using a webbrowser.

I would like to do operations on my DB only if use a command line like
java -jar -DUrl=

http://localhost:8983/solr/update?stream.body=http://localhost:8983/solr/**update?stream.body=
http://**localhost:8983/solr/update?**stream.body=http://localhost:8983/solr/update?stream.body=
delete**queryid:298253/**query/**deletecommit=true post.jar


is it possible?

Thanks a lot,
Bruno







Re: [ANNOUNCE] Web Crawler

2013-01-27 Thread O. Klein
This is actualy showing it works.

crawlerws is used by Crawl Anywhere UI and will pass it the correct
arguments when needed.




SivaKarthik wrote
 Hii,
  I'm trying to configure crawl-anywhere 3.0.3 version in my local system..
  i'm following the steps from the page
 http://www.crawl-anywhere.com/installation-v300/
  but, crawlerws is failing and throwing the below error message in the
 brower
   http://localhost:8080/crawlerws/
 error

 errno
 1
 /errno

 errmsg
 Missing action
 /errmsg
 /error
 Not sure where im doing wrong.. could please help me out to resolve the
 problem.. thank you.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036520.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud index recovery

2013-01-27 Thread Marcin Rzewucki
Hi Mark,

I see no such issues in Solr 4.1. It seems to work fine.

Thanks.

On 24 January 2013 03:58, Mark Miller markrmil...@gmail.com wrote:

 Yeah, I don't know what you are seeing offhand. You might try Solr 4.1 and
 see if it's something that has been resolved.

 - Mark

 On Jan 23, 2013, at 3:14 PM, Marcin Rzewucki mrzewu...@gmail.com wrote:

  Guys, I pasted you the full log (see pastebin url). Yes, it is Solr4.0. 2
  cores are in sync, but the 3rd one is not:
  INFO: PeerSync Recovery was not successful - trying replication.
 core=ofac
  INFO: Starting Replication Recovery. core=ofac
 
  It started replication and even says it is done successfully:
  INFO: Replication Recovery was successful - registering as Active.
 core=ofac
 
  but index files were not downloaded. It's empty, no docs. Also I do not
 see
  replication.properties file. tlog dir is empty and index dir
 contains
  only 3 files: segments.gen, segments_7 and write.lock
  It seems to be tough issue. Anyway, thanks for your help.
 
 
  On 23 January 2013 15:41, Mark Miller markrmil...@gmail.com wrote:
 
  Looks like it shows 3 cores start - 2 with versions that decide they are
  up to date and one that replicates. The one that replicates doesn't have
  much logging showing that activity.
 
  Is this Solr 4.0?
 
  - Mark
 
  On Jan 23, 2013, at 9:27 AM, Upayavira u...@odoko.co.uk wrote:
 
  Mark,
 
  Take a peek in the pastebin url Marcin mentioned earlier
  (http://pastebin.com/qMC9kDvt) is there enough info there?
 
  Upayavira
 
  On Wed, Jan 23, 2013, at 02:04 PM, Mark Miller wrote:
  Was your full logged stripped? You are right, we need more. Yes, the
  peer
  sync failed, but then you cut out all the important stuff about the
  replication attempt that happens after.
 
  - Mark
 
  On Jan 23, 2013, at 5:28 AM, Marcin Rzewucki mrzewu...@gmail.com
  wrote:
 
  Hi,
  Previously, I took the lines related to collection I tested. Maybe
  some interesting part was missing. I'm sending the full log this time.
  It ends up with:
  INFO: Finished recovery process. core=ofac
 
  The issue I described is related to collection called ofac. I hope
  the log is meaningful now.
 
  It is trying to do the replication, but it seems to not know which
  files to download.
 
  Regards.
 
  On 23 January 2013 10:39, Upayavira u...@odoko.co.uk wrote:
  the first stage is identifying whether it can sync with transaction
  logs. It couldn't, because there's no index. So the logs you have
 shown
  make complete sense. It then says 'trying replication', which is
 what I
  would expect, and the bit you are saying has failed. So the
 interesting
  bit is likely immediately after the snippet you showed.
 
 
 
  Upayavira
 
 
 
 
 
  On Wed, Jan 23, 2013, at 07:40 AM, Marcin Rzewucki wrote:
 
  OK, so I did yet another test. I stopped solr, removed whole data/
  dir and started Solr again. Directories were recreated fine, but
  missing files were not downloaded from leader. Log is attached (I
  took the lines related to my test with 2 lines of context. I hope it
  helps.). I could find the following warning message:
 
 
  Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
  INFO: PeerSync: core=ofac url=http://replica_host:8983/solr START
  replicas=[http://leader_host:8983/solr/ofac/] nUpdates=100
  Jan 23, 2013 7:16:08 AM org.apache.solr.update.PeerSync sync
  WARNING: no frame of reference to tell of we've missed updates
  Jan 23, 2013 7:16:08 AM org.apache.solr.cloud.RecoveryStrategy
  doRecovery
  INFO: PeerSync Recovery was not successful - trying replication.
  core=ofac
 
  So it did not know which files to download ?? Could you help me to
  solve this problem ?
 
  Thanks in advance.
  Regards.
 
  On 22 January 2013 23:06, Yonik Seeley [1]yo...@lucidworks.com
  wrote:
 
  On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki
  [2]mrzewu...@gmail.com wrote:
 
  Sorry, my mistake. I did 2 tests: in the 1st I removed just index
  directory
 
  and in 2nd test I removed both index and tlog directory. Log lines
  I've
 
  sent are related to the first case. So Solr could read tlog
 directory
  in
 
  that moment.
 
  Anyway, do you have an idea why it did not download files from
 leader
  ?
 
  For your 1st test, if you only deleted the index and not the
 
  transaction logs, Solr will look at the transaction logs to try and
 
  determine if it is up to date or not (by comparing with peers).
 
  If you want to clear out all the data, remove the entire data
  directory.
 
 
 
  -Yonik
 
  [3]http://lucidworks.com
 
  References
 
  1. mailto:yo...@lucidworks.com
  2. mailto:mrzewu...@gmail.com
  3. http://lucidworks.com/
 
 
 
 




[Announce] Apache Solr 3.6.2 with RankingAlgorithm 1.4.3 available for download now -- includes experimental TimedSerialMergeSchdeduler

2013-01-27 Thread Nagendra Nagarajayya

Hi:

I am very excited to announce the availability of Apache Solr 3.6.2 with 
RankingAlgorithm30 1.4.3 with realtime-search support. realtime-search 
is very fast NRT and allows you to not only lookup a document by id but 
also allows you to search in realtime, see 
http://tgels.org/realtime-nrt.jsp. The update performance is about 
10,000 docs / sec. The query performance is in ms, allows you to  query 
a 10m wikipedia index (complete index) in 50 ms.


This release also includes a experimental TimedSerialMergeScheduler 
http://rankingalgorithm.1050964.n5.nabble.com/TimedSerialMergerScheduler-java-allows-merges-to-be-deferred-to-a-known-time-like-11pm-or-1am-tp5706350.html that 
allows you to postpone your merges to off hours time like 11pm or 1am 
increasing performance.


RankingAlgorithm30 1.4.3 supports the entire Lucene Query Syntax, ± 
and/or boolean queries.


You can get more information about realtime-search performance from here:
http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver3.x

You can download Solr 3.6.2 with RankingAlgorithm30 1.4.3 from here:
http://solr-ra.tgels.org

Please download and give the new version a try.

Note:
1. Apache Solr 3.6.2 with RankingAlgorithm30 1.4.3 is an external project.
2. realtime-search has been contributed back to Apache Solr, see 
https://issues.apache.org/jira/browse/SOLR-3816



Regards,

Nagendra Nagarajayya
http://solr-ra.tgels.org
http://elasticsearch-ra.tgels.org
http://rankingalgorithm.tgels.org



Re: java.lang.IllegalArgumentException: ./collection_shard2_1/data/index does not exist

2013-01-27 Thread Prashant Saraswat
Thanks Marcin. I found your post via a google search and there was no reply
attached to it so I thought no one replied. Apologies and thanks again.


On Sat, Jan 26, 2013 at 6:48 PM, Marcin Rzewucki mrzewu...@gmail.comwrote:

 Hi,

 Actually Mark Miller replied to this issue and it seems to be fixed in Solr
 4.1 as far as I checked. Anyway, it was harmless both for querying and
 indexing.

 Regards.

 On 26 January 2013 20:14, Prashant Saraswat 
 prashant.saras...@pixalsoft.com
  wrote:

  Hi Guys,
 
  We are using Solr 4.0 in a 2 shard cluster with replication enabled. On
  solr startup we get an exception like this:
  WARNING: Could not getStatistics on info bean
  org.apache.solr.handler.ReplicationHandler
  java.lang.IllegalArgumentException: ./collectionOne_shard2_1/data/index
  does not exist
  at org.apache.commons.io.FileUtils.sizeOfDirectory(FileUtils.java:2074)
  at
 
 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:477)
  at
 
 
 org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:525)
  at
 
 
 org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:231)
  at
 
 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterc
  eptor.java:321)...
 
  This directory doesn't exist. But we do have a directory like this:
  index.1234...
 
  Indexing and search seem to be fine. Can someone confirm that this is
  harmless?
 
  Marcin Rzewucki asked the same question on December 28 2012 and got no
  response. Can someone kindly respond please?
 
  Thanks
  PixalSoft
 



secure Solr server

2013-01-27 Thread Mingfeng Yang
Before Solr 4.0, I secure solr by enable password protection in Jetty.
 However, password protection will make solrcloud not work.

We use EC2 now, and we need the www admin interface of solr to be
accessible (with password) from anywhere.

How do you protect your solr sever from unauthorized access?

Thanks,
Ming


Re: secure Solr server

2013-01-27 Thread Isaac Hebsh
You can define a security filter in WEB-INF\web.xml, on specific url
patterns.
You might want to set the url pattern to /admin/*.

[find examples here:
http://stackoverflow.com/questions/7920092/how-can-i-bypass-security-filter-in-web-xml
]


On Sun, Jan 27, 2013 at 8:07 PM, Mingfeng Yang mfy...@wisewindow.comwrote:

 Before Solr 4.0, I secure solr by enable password protection in Jetty.
  However, password protection will make solrcloud not work.

 We use EC2 now, and we need the www admin interface of solr to be
 accessible (with password) from anywhere.

 How do you protect your solr sever from unauthorized access?

 Thanks,
 Ming



RE: The best way to delete several documents?

2013-01-27 Thread Harshvardhan Ojha
Hi Bruno,

Why don't you write deletePkQuery to delete these documents and set your cron 
to run delta query on every Friday?

Regards
Harshvardhan Ojha

-Original Message-
From: Bruno Mannina [mailto:bmann...@free.fr]
Sent: Sunday, January 27, 2013 6:03 PM
To: solr-user@lucene.apache.org
Subject: Re: The best way to delete several documents?

yep ok thks !

Le 27/01/2013 13:27, Marcin Rzewucki a écrit :
 You can write a script and remove say 50 docs in 1 call. It's always
 better than removing 1 by 1.
 Regards.


 On 27 January 2013 13:17, Bruno Mannina bmann...@free.fr wrote:

 Hi,

 Even If I have one or two thousands of Id ?

 Thanks

 Le 27/01/2013 13:15, Marcin Rzewucki a écrit :

 Hi,

 The best is if you could find a query for all docs you want to
 remove. If this is not simple you can use the following syntax: id:
 (1 2 3 4 5) to remove group of docs by ID (and if your default query 
 operator is OR).

 Regards.

 On 27 January 2013 11:47, Bruno Mannina bmann...@free.fr wrote:

   Dear Solr users,
 Every Friday I need to delete some documents on my solr db (around
 100~200
 docs).

 Could you help me to choose the best way to delete these documents.
 - I have the unique ID of each documents

 Another question:
 How can I disable to possibility to do:

 http://localhost:8983/solr/update?stream.body=http://localhost
 :8983/solr/**update?stream.body=
 http://**localhost:8983/solr/update?**stream.body=http://localhos
 t:8983/solr/update?stream.body=
 delete**queryid:298253/**query/**deletecommit=true

 by using a webbrowser.

 I would like to do operations on my DB only if use a command line
 like java -jar -DUrl=

 http://localhost:8983/solr/update?stream.body=http://localhost
 :8983/solr/**update?stream.body=
 http://**localhost:8983/solr/update?**stream.body=http://localhos
 t:8983/solr/update?stream.body=
 delete**queryid:298253/**query/**deletecommit=true
 post.jar


 is it possible?

 Thanks a lot,
 Bruno




The contents of this email, including the attachments, are PRIVILEGED AND 
CONFIDENTIAL to the intended recipient at the email address to which it has 
been addressed. If you receive it in error, please notify the sender 
immediately by return email and then permanently delete it from your system. 
The unauthorized use, distribution, copying or alteration of this email, 
including the attachments, is strictly forbidden. Please note that neither 
MakeMyTrip nor the sender accepts any responsibility for viruses and it is your 
responsibility to scan the email and attachments (if any). No contracts may be 
concluded on behalf of MakeMyTrip by means of email communications.


Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs

2013-01-27 Thread Rahul Bishnoi
Hi Shawn,

Thanks for your reply. After following your suggestions we were able to
index 30k documents. I have some queries:

1) What is stored in the RAM while only indexing is going on?  How to
calculate the RAM/heap requirements for our documents?
2) The document cache, filter cache, etc...are populated while querying.
Correct me if I am wrong. Are there any caches that are populated while
indexing?

Thanks,
Rahul



On Sat, Jan 26, 2013 at 11:46 PM, Shawn Heisey s...@elyograg.org wrote:

 On 1/26/2013 12:55 AM, Rahul Bishnoi wrote:

 Thanks for quick reply and addressing each point queried.

 Additional asked information is mentioned below:

 OS = Ubuntu 12.04 (64 bit)
 Sun Java 7 (64 bit)
 Total RAM = 8GB

 SolrConfig.xml is available at http://pastebin.com/SEFxkw2R


 Rahul,

 The MaxPermGenSize could be a contributing factor.  The documents where
 you have 1000 words are somewhat large, though your overall index size is
 pretty small.  I would try removing the MaxPermGenSize option and see what
 happens.  You can also try reducing the ramBufferSizeMB in solrconfig.xml.
  The default in previous versions of Solr was 32, which is big enough for
 most things, unless you are indexing HUGE documents like entire books.

 It looks like you have the cache sizes under query at values close to
 default.  I wouldn't decrease the documentCache any - in fact an increase
 might be a good thing there.  As for the others, you could probably reduce
 them.  The filterCache size I would start at 64 or 128.  Watch your cache
 hitratios to see whether the changes make things remarkably worse.

 If that doesn't help, try increasing the -Xmx option - first 3072m, next
 4096m.  You could go as high as 6GB and not run into any OS cache problems
 with your small index size, though you might run into long GC pauses.

 Indexing, especially big documents, is fairly memory intensive.  Some
 queries can be memory intensive as well, especially those using facets or a
 lot of clauses.

 Under normal operation, I could probably get away with a 3GB heap size,
 but I have it at 8GB because otherwise a full reindex (full-import from
 mysql) runs into OOM errors.

 Thanks,
 Shawn




Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs

2013-01-27 Thread Shawn Heisey

On 1/27/2013 10:28 PM, Rahul Bishnoi wrote:

Thanks for your reply. After following your suggestions we were able to
index 30k documents. I have some queries:

1) What is stored in the RAM while only indexing is going on?  How to
calculate the RAM/heap requirements for our documents?
2) The document cache, filter cache, etc...are populated while querying.
Correct me if I am wrong. Are there any caches that are populated while
indexing?


If anyone catches me making statements that are not true, please feel 
free to correct me.


The caches are indeed only used during querying.  If you are not making 
queries at all, they aren't much of a factor.


I can't give you any definitive answers to your question about RAM usage 
and how to calculate RAM/heap requirements.  I can make some general 
statements without looking at the code, just based on what I've learned 
so far about Solr, and about Java in general.


You would have an exact copy of the input text for each field initially, 
which would ultimately get used for the stored data (for those fields 
that are stored).  Each one is probably just a plain String, though I 
don't know as I haven't read the code.  If the field is not being stored 
or copied, then it would be possible to get rid of that data as soon as 
it is no longer required for indexing.  I don't have any idea whether 
Solr/Lucene code actually gets rid of the exact copy in this way.


If you are storing termvectors, additional memory would be needed for 
that.  I don't know if that involves lots of objects or if it's one 
object with index information.  Based on my experience, termvectors can 
be bigger than the stored data for the same field.


Tokenization and filtering is where I imagine that most of the memory 
would get used.  If you're using a filter like EdgeNGram, that's a LOT 
of tokens.  Even if you're just tokenizing words, it can add up.  There 
is also space required for the inverted index, norms, and other 
data/metadata.  If each token is a separate Java object (which I do not 
know), there would be a fair amount of memory overhead involved.


A String object in java has something like 40 bytes of overhead above 
and beyond the space required for the data.  Also, strings in Java are 
internally represented in UTF-16, so each character actually takes two 
bytes.


http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

The finished documents stack up in the ramBufferSizeMB space until it 
gets full or a hard commit is issued, at which point they are flushed to 
disk as a Lucene segment.  One thing that I'm not sure about is whether 
an additional ram buffer is allocated for further indexing while the 
flush is happening, or if it flushes and then re-uses the buffer for 
subsequent documents.


Another way that it can use memory is when merging index segments.  I 
don't know how much memory gets used for this process.


On Solr 4 with the default directory factory, part of a flushed segment 
may remain in RAM until enough additional segment data is created.  The 
amount of memory used by this feature should be pretty small, unless you 
have a lot of cores on a single JVM.  That extra memory can be 
eliminated by using MMapDirectoryFactory instead of 
NRTCachingDirectoryFactory, at the expense of fast Near-RealTime index 
updates.


Thanks,
Shawn



[SOLR 4.0] Number of fields vs searching speed

2013-01-27 Thread Roman Slavik
Hi guys,
what is relation between number of indexed fields and searching speed? 

For example I have same number of records, same searching SOLR query but 100
indexed fields for each record in case 1 and 1000 fields in case 2. I's
obvious that searching time in case 2 will be greater, but how much? 10
times? Or is there another relation between number of indexed fields and
search time?

Thanks a lot!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-4-0-Number-of-fields-vs-searching-speed-tp4036665.html
Sent from the Solr - User mailing list archive at Nabble.com.