Wow, that works! I really appreciate the help! 🎉🎉🎉
Aldrin Montana Computer Science PhD Student UC Santa Cruz On Tue, Aug 17, 2021 at 3:17 PM Ian Cook <[email protected]> wrote: > Hi Aldrin, > > Please try this: > > sample_schema <- schema(!!!schema_fields) > > The schema() function now uses rlang functions to evaluate its arguments, > so variable names need to be unquoted and spliced with !!! > > Ian > > > On Tue, Aug 17, 2021 at 5:22 PM Aldrin <[email protected]> wrote: > >> Hello! >> >> I am pretty confused by the schema factory function in R, because I think >> what I'm doing should work, but it doesn't seem to. I have inlined the code >> below, but if there's an alternate way to setting the data types of a >> schema in R, then I would welcome recommendations for those as well. >> >> Anyways, the brief overview is that I want to create tables from matrices >> that will have anywhere from hundreds of columns to thousands, and >> specifying the schema inline is not going to be useful. I figure I should >> be able to create a named list and then pass it to the schema factory >> function, but I always get an error when trying to do so ("Error: >> !is.null(nms <- names(.list)) is not TRUE"). >> >> I could update to arrow 5.0.0, but I assume that my problem shouldn't be >> a problem in arrow 4.0.1. >> >> Thanks for any help! >> >> Working code: >> >> Create an example data frame: >> sample_df <- data.frame( >> SRR12=c(0) >> ,SRR20=c(0) >> ,SRR24=c(4) >> ,SRR27=c(223) >> ,row.names=c('ENSG3') >> ) >> >> sample_df >> >>> SRR12 SRR20 SRR24 SRR27 >>> ENSG3 0 0 4 223 >> >> >> Create an arrow table, specify the schema inline: >> sample_table <- Table$create( >> sample_df >> ,schema=schema( >> SRR12=uint16() >> ,SRR20=uint16() >> ,SRR24=uint16() >> ,SRR27=uint16() >> ) >> ) >> >> sample_table >> >>> Table >>> 1 rows x 4 columns >>> $SRR12 <uint16> >>> $SRR20 <uint16> >>> $SRR24 <uint16> >>> $SRR27 <uint16> >>> >> >> Create a schema from a list, because we want > 1000 columns sometimes: >> schema_fields <- list(SRR12=uint16(), SRR20=uint16(), SRR24=uint16(), >> SRR27=uint16()) >> sample_schema <- schema(schema_fields) >> >>> Error: !is.null(nms <- names(.list)) is not TRUE >>> >> >> schema_fields >> >>> $SRR12 >>> UInt16 >>> uint16 >>> >>> $SRR20 >>> UInt16 >>> uint16 >>> >>> $SRR24 >>> UInt16 >>> uint16 >>> >>> $SRR27 >>> UInt16 >>> uint16 >> >> >> >> Package information (system is macbook M1): >> > brew info apache-arrow >> >> apache-arrow: stable 5.0.0 (bottled), HEAD >> Columnar in-memory analytics layer designed to accelerate big data >> https://arrow.apache.org/ >> /opt/homebrew/Cellar/apache-arrow/4.0.1_2 (534 files, 92.9MB) * >> Poured from bottle on 2021-07-07 at 16:10:51 >> From: >> https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/apache-arrow.rb >> License: Apache-2.0 >> ==> Dependencies >> Build: boost ✔, cmake ✘, llvm ✘ >> Required: brotli ✔, glog ✔, grpc ✘, lz4 ✔, numpy ✘, [email protected] ✔, >> protobuf ✔, [email protected] ✔, rapidjson ✔, re2 ✘, snappy ✔, thrift ✔, >> utf8proc ✔, zstd ✔ >> ==> Options >> --HEAD >> Install HEAD version >> ==> Analytics >> install: 1,715 (30 days), 5,687 (90 days), 18,191 (365 days) >> install-on-request: 994 (30 days), 3,232 (90 days), 10,314 (365 days) >> build-error: 0 (30 days) >> >> >> > arrow::arrow_info() >> >> Arrow package version: 4.0.1 >> >> Capabilities: >> >> dataset TRUE >> parquet TRUE >> s3 FALSE >> utf8proc TRUE >> re2 TRUE >> snappy TRUE >> gzip TRUE >> brotli TRUE >> zstd TRUE >> lz4 TRUE >> lz4_frame TRUE >> lzo FALSE >> bz2 TRUE >> jemalloc TRUE >> mimalloc FALSE >> >> Memory: >> >> Allocator jemalloc >> Current 256 bytes >> Max 2.31 Kb >> >> Runtime: >> >> SIMD Level none >> Detected SIMD Level none >> >> >> >> Aldrin Montana >> Computer Science PhD Student >> UC Santa Cruz >> >
