1. is it designed behavior (true for couchdb also).

2. again, designed behavior. They’ll be closed if the database closes, which 
happens when you exceed the size of the LRU.

3. oooh. that’s a mistake on your part, and it explains it all. You must 
_view_cleanup on :5984, the clustering code has to be invoked as the design 
document could be on a remote node, the logic to calculate shard filenames to 
retain, etc.

B.

On 3 Feb 2014, at 14:39, Vladimir Ralev <[email protected]> wrote:

> Hello again.
> 
> I did some more testing and here are some observations. I am analyzing this
> from the perspective of running 1000 databases, with 30 views each.
> Bigcouch will partition the databases into smaller DBS, about 10000
> databases in total per machine. Each of these will have 30 views. And 300K+
> files in the directory structure total, per machine.
> 
> What I see in a smaller scale test is the following
> 1. Initially the views are not generated, only when you access the view
> http://host:5984/aea8b710ab5f0/desgn/etc.. then the view files is built
> from scratch.
> 2. Once you access the view file this way, the file handles to this file
> are kept open forever from the beam.smp process. Never closes until the
> bigcouch is restarted. The couchjs process terminates and releases the
> handle while indexing.
> 3. If you run
> http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
> views are deleted, always
> 4. If you run http://host:5984/aea8b710ab5f0/_view_cleanup the views are
> NOT deleted, I guess that's the correct clean up I should use
> 5. If you restart bigcouch to force the file handle to close, and make no
> read request to that view (to open the file handle), the bigcouch will
> slowly start to open files and never close them again until next time.
> 6. When you delete files with
> http://host:5986/shards%2F00000000-1fffffff%2Faea8b710ab5f0.1385154105/_view_cleanupthe
> erlanf file:delete is used which doesn't care about file handles, it
> just deleted by name, thus the deleted files remain referenced and the
> handle is preserved to be seen in lsof. The cycle of deleting and
> rebuilding these files never stops and the descriptors leak.
> 
> Do these observations make sense?
> 
> I think 300K+ handles is manageable as long as it doesn't recycle
> constantly, but I need to understand the correct _view_cleanup REST API to
> use. Is http://host:5984 sufficient?
> 
> I added some logs on file close and so on and it's mostly called on db
> files. I couldn't trace it to any point to release a view file handle, if
> you can point me to the code which may release it, I can check.
> 
> Thanks a lot for any feedback.
> 
> 
> On Sat, Feb 1, 2014 at 6:41 AM, Vladimir Ralev 
> <[email protected]>wrote:
> 
>> Not sure at all. I don't know how to check precisely if a live design doc
>> is pointing to a particular file. I was basing my statement off the fact
>> that I have my views declared and they were available pre-indexed before
>> compaction (but they were not physically opened as file handles by couch,
>> but they were opened on demand). Once I finish my current script, I will
>> test everything again and will spend some time tracing the code.
>> 
>> 
>> On Fri, Jan 31, 2014 at 6:52 PM, Robert Samuel Newson 
>> <[email protected]>wrote:
>> 
>>> 
>>> Ownership is interesting. Would the bigcouch user have the right to
>>> delete the file but not open it for reading?
>>> 
>>> There's definitely an issue in bigcouch (fixed long since in couchdb)
>>> where any failure to open a view file makes us delete it.
>>> 
>>> OS/fs all check out fine, You see the filename that should be retained in
>>> that log output? you're 100% sure? You do have a live design doc pointing
>>> to it?
>>> 
>>> B.
>>> 
>>> On 31 Jan 2014, at 16:39, Vladimir Ralev <[email protected]>
>>> wrote:
>>> 
>>>> Thanks a lot. The database was moved from older machines so some other
>>> file
>>>> system metadata might be scrambled. But I don't see what can cause a
>>>> problem like this.
>>>> 
>>>> Yes the debug output is seen "deleting unused view index files:" and it
>>>> deletes every view in every database, little doubt about it. It doesn't
>>>> delete fresh views though that are fully regenerated afterwards. I think
>>>> the original views somehow got corrupted, but I need to figure out why
>>> and
>>>> may be fix it manually with a script
>>>> 
>>>> OS is Debian 64, file system is ext4, there is a little scramble of the
>>>> file ownership, some directories are owned by old bigcouch user, others
>>> by
>>>> root, so that's one thing I am investigating. I reset the ownership, but
>>>> will have to repeat it for my next tests.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jan 31, 2014 at 6:21 PM, Robert Samuel Newson <
>>> [email protected]>wrote:
>>>> 
>>>>> and details of OS, filesystem, anything you think might be relevant.
>>>>> 
>>>>> B.
>>>>> 
>>>>> On 31 Jan 2014, at 16:20, Robert Samuel Newson <[email protected]>
>>> wrote:
>>>>> 
>>>>>> First thing to note is that bigcouch development is over, but we can
>>> at
>>>>> least confirm this;
>>>>>> 
>>>>>> This function fetches all the design docs of the database, grabs all
>>> the
>>>>> signatures from each (you'll have noticed view filenames look
>>> uuid/randomy,
>>>>> that's a 'sig'), and then sweeps the dir where all views for the given
>>>>> database should be and deletes those not in the 'keep' list.
>>>>>> 
>>>>>> Can you enable debug level logging (curl
>>>>> localhost:5984/_config/log/level -X PUT -d '"debug"' to *all* bigcouch
>>>>> nodes) and tell us if ;
>>>>>> 
>>>>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>>>> 
>>>>>> actually gets printed?
>>>>>> 
>>>>>> B.
>>>>>> 
>>>>>> On 31 Jan 2014, at 16:09, Vladimir Ralev <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> bigcouch 0.4.2 has the following code that handles view cleanup:
>>>>>>> 
>>>>>>> cleanup_index_files(Db) ->
>>>>>>> 
>>>>>>> % load all ddocs
>>>>>>> 
>>>>>>> {ok, DesignDocs} = couch_db:get_design_docs(Db),
>>>>>>> 
>>>>>>> 
>>>>>>> % make unique list of group sigs
>>>>>>> 
>>>>>>> Sigs = lists:map(fun(#doc{id = GroupId}) ->
>>>>>>> 
>>>>>>>     {ok, Info} = get_group_info(Db, GroupId),
>>>>>>> 
>>>>>>>     ?b2l(couch_util:get_value(signature, Info))
>>>>>>> 
>>>>>>> end, [DD||DD <- DesignDocs, DD#doc.deleted == false]),
>>>>>>> 
>>>>>>> 
>>>>>>> FileList = list_index_files(Db),
>>>>>>> 
>>>>>>> 
>>>>>>> DeleteFiles =
>>>>>>> 
>>>>>>> if length(Sigs) =:= 0 ->
>>>>>>> 
>>>>>>>     FileList;
>>>>>>> 
>>>>>>> true ->
>>>>>>> 
>>>>>>>     % regex that matches all ddocs
>>>>>>> 
>>>>>>>     RegExp = "("++ string:join(Sigs, "|") ++")",
>>>>>>> 
>>>>>>> 
>>>>>>> % filter out the ones in use
>>>>>>> 
>>>>>>>     [FilePath || FilePath <- FileList,
>>>>>>> 
>>>>>>>         re:run(FilePath, RegExp, [{capture, none}]) =:= nomatch]
>>>>>>> 
>>>>>>> end,
>>>>>>> 
>>>>>>> 
>>>>>>> % delete unused files
>>>>>>> 
>>>>>>> ?LOG_DEBUG("deleting unused view index files: ~p",[DeleteFiles]),
>>>>>>> 
>>>>>>> RootDir = couch_config:get("couchdb", "view_index_dir"),
>>>>>>> 
>>>>>>> [couch_file:delete(RootDir,File,false)||File <- DeleteFiles],
>>>>>>> 
>>>>>>> ok.
>>>>>>> 
>>>>>>> 
>>>>>>> From here
>>>>>>> 
>>>>> 
>>> https://github.com/cloudant/bigcouch/blob/master/apps/couch/src/couch_view.erl#L84
>>>>>>> 
>>>>>>> It's supposed to delete only unused views, but in my case it deletes
>>>>>>> everything and then starts building from scratch. Can you help me
>>>>>>> understand the condition used here to filter the files that are
>>>>> currently
>>>>>>> in use? How is the regex supposed to work.
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 

Reply via email to